This introductory text to statistical machine translation (SMT) provides all of the theories and methods needed to build a statistical machine translator, such as Google Language Tools and Babelfish. In general, statistical techniques allow automatic translation systems to be built quickly for any language-pair using only translated texts and generic software. With increasing globalization, statistical machine translation will be central to communication and commerce. Based on courses and tutorials, and classroom-tested globally, it is ideal for instruction or self-study, for advanced undergraduates and graduate students in computer science and/or computational linguistics, and researchers in natural language processing. The companion website provides open-source corpora and tool-kits.
Cited By
- Lai H and Nissim M (2024). A Survey on Automatic Generation of Figurative Language: From Rule-based Systems to Large Language Models, ACM Computing Surveys, 56:10, (1-34), Online publication date: 31-Oct-2024.
- Huang Z, Chen J, Jiang J, Liang Y, You H and Li F (2024). Mapping APIs in Dynamic-typed Programs by Leveraging Transfer Learning, ACM Transactions on Software Engineering and Methodology, 33:4, (1-29), Online publication date: 31-May-2024.
- Mondal S, Zhang H, Kabir H, Ni K and Dai H (2023). Machine translation and its evaluation: a study, Artificial Intelligence Review, 56:9, (10137-10226), Online publication date: 1-Sep-2023.
- Chakrabarty A, Dabre R, Ding C, Utiyama M and Sumita E (2023). Low-resource Multilingual Neural Translation Using Linguistic Feature-based Relevance Mechanisms, ACM Transactions on Asian and Low-Resource Language Information Processing, 22:7, (1-36), Online publication date: 31-Jul-2023.
- Bala Das S, Biradar A, Kumar Mishra T and Kr. Patra B (2023). Improving Multilingual Neural Machine Translation System for Indic Languages, ACM Transactions on Asian and Low-Resource Language Information Processing, 22:6, (1-24), Online publication date: 30-Jun-2023.
- Shi X, Huang H, Jian P and Tang Y (2023). Approximating to the Real Translation Quality for Neural Machine Translation via Causal Motivated Methods, ACM Transactions on Asian and Low-Resource Language Information Processing, 22:5, (1-26), Online publication date: 31-May-2023.
- Liu F, Li J and Zhang L Syntax and Domain Aware Model for Unsupervised Program Translation Proceedings of the 45th International Conference on Software Engineering, (755-767)
- Li D, Chen T, Zadikian A, Tung A and Chilton L Improving Automatic Summarization for Browsing Longform Spoken Dialog Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, (1-20)
- Kumar A, Mundotiya R, Pratap A and Singh A (2022). TLSPG, Journal of King Saud University - Computer and Information Sciences, 34:9, (6552-6563), Online publication date: 1-Oct-2022.
- Rivera-Trigueros I (2022). Machine translation systems and quality assessment: a systematic review, Language Resources and Evaluation, 56:2, (593-619), Online publication date: 1-Jun-2022.
- Satir E and Bulut H (2021). Preventing translation quality deterioration caused by beam search decoding in neural machine translation using statistical machine translation, Information Sciences: an International Journal, 581:C, (791-807), Online publication date: 1-Dec-2021.
- Premjith B and Soman K (2021). Deep Learning Approach for the Morphological Synthesis in Malayalam and Tamil at the Character Level, ACM Transactions on Asian and Low-Resource Language Information Processing, 20:6, (1-17), Online publication date: 30-Nov-2021.
- Wang Y, Wang Y, Dang K, Liu J and Liu Z (2021). A Comprehensive Survey of Grammatical Error Correction, ACM Transactions on Intelligent Systems and Technology, 12:5, (1-51), Online publication date: 31-Oct-2021.
- Shi X, Huang H, Jian P and Tang Y Reducing Length Bias in Scoring Neural Machine Translation via a Causal Inference Method Chinese Computational Linguistics, (3-15)
- Lalrempuii C, Soni B and Pakray P (2021). An Improved English-to-Mizo Neural Machine Translation, ACM Transactions on Asian and Low-Resource Language Information Processing, 20:4, (1-21), Online publication date: 31-Jul-2021.
- Chakravarthi B, Rani P, Arcan M and McCrae J (2021). A Survey of Orthographic Information in Machine Translation, SN Computer Science, 2:4, Online publication date: 1-Jul-2021.
- Dhamala J, Sun T, Kumar V, Krishna S, Pruksachatkun Y, Chang K and Gupta R BOLD Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, (862-872)
- Wang R and Ding B (2021). Research on Intelligent English Translation Method Based on the Improved Attention Mechanism Model, Scientific Programming, 2021, Online publication date: 1-Jan-2021.
- Duan C, Chen K, Wang R, Utiyama M, Sumita E, Zhu C and Zhao T (2021). Modeling Future Cost for Neural Machine Translation, IEEE/ACM Transactions on Audio, Speech and Language Processing, 29, (770-781), Online publication date: 1-Jan-2021.
- Duanzhu S, Zhang R and Jia C Bidirectional Boost: On Improving Tibetan-Chinese Neural Machine Translation With Back-Translation and Self-Learning Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence, (1-6)
- Gros D, Sezhiyan H, Devanbu P and Yu Z Code to comment "translation" Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, (746-757)
- Daneshgar N and Sarmad M (2020). word.alignment: an R package for computing statistical word alignment and its evaluation, Computational Statistics, 35:4, (1597-1619), Online publication date: 1-Dec-2020.
- Phan H and Jannesari A Statistical machine translation outperforms neural machine translation in software engineering: why and how Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages, (3-12)
- Ameur M, Meziane F and Guessoum A (2020). Arabic Machine Translation, Computer Science Review, 38:C, Online publication date: 1-Nov-2020.
- Yang M, Wang X, Zhang M and Zhao T Incorporating Phrase-Level Agreement into Neural Machine Translation Natural Language Processing and Chinese Computing, (416-428)
- Deng Y, Huang H, Chen X, Liu Z, Wu S, Xuan J and Li Z From Code to Natural Language: Type-Aware Sketch-Based Seq2Seq Learning Database Systems for Advanced Applications, (352-368)
- Jónsson H, Símonarson H, Snæbjarnarson V, Steingrímsson S and Loftsson H Experimenting with Different Machine Translation Models in Medium-Resource Settings Text, Speech, and Dialogue, (95-103)
- Balashov Y (2020). The Translator’s Extended Mind, Minds and Machines, 30:3, (349-383), Online publication date: 1-Sep-2020.
- Yao K, Li H, Shang W and Hassan A (2020). A study of the performance of general compressors on log files, Empirical Software Engineering, 25:5, (3043-3085), Online publication date: 1-Sep-2020.
- Sulubacak U, Caglayan O, Grönroos S, Rouhe A, Elliott D, Specia L and Tiedemann J (2020). Multimodal machine translation through visuals and speech, Machine Translation, 34:2-3, (97-147), Online publication date: 1-Sep-2020.
- Lyons S (2020). A review of Thai–English machine translation, Machine Translation, 34:2-3, (197-230), Online publication date: 1-Sep-2020.
- Jabeen S, Gao X and Andreae P (2019). Semantic association computation: a comprehensive survey, Artificial Intelligence Review, 53:6, (3849-3899), Online publication date: 1-Aug-2020.
- Modarresi K Detecting the Most Insightful Parts of Documents Using a Regularized Attention-Based Model Computational Science – ICCS 2020, (272-281)
- Ranta A, Angelov K, Gruzitis N and Kolachina P (2020). Abstract Syntax as Interlingua, Computational Linguistics, 46:2, (425-486), Online publication date: 1-Jun-2020.
- Prates M, Avelar P and Lamb L (2019). Assessing gender bias in machine translation: a case study with Google Translate, Neural Computing and Applications, 32:10, (6363-6381), Online publication date: 1-May-2020.
- Li H, Huang G, Cai D and Liu L (2020). Neural Machine Translation With Noisy Lexical Constraints, IEEE/ACM Transactions on Audio, Speech and Language Processing, 28, (1864-1874), Online publication date: 1-Jan-2020.
- Mehndiratta A and Asawa K Recent Advances and Challenges in Design of Non-goal-Oriented Dialogue Systems Big Data Analytics, (33-43)
- Chinea-Rios M, Sanchis-Trilles G and Casacuberta F (2019). Discriminative ridge regression algorithm for adaptation in statistical machine translation, Pattern Analysis & Applications, 22:4, (1293-1305), Online publication date: 1-Nov-2019.
- Pathak A, Pakray P and Bentham J (2019). English–Mizo Machine Translation using neural and statistical approaches, Neural Computing and Applications, 31:11, (7615-7631), Online publication date: 1-Nov-2019.
- Tufano M, Watson C, Bavota G, Penta M, White M and Poshyvanyk D (2019). An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation, ACM Transactions on Software Engineering and Methodology, 28:4, (1-29), Online publication date: 31-Oct-2019.
- Berrichi S and Mazroui A Guiding word alignment with prior knowledge to improve English-Arabic Machine Translation Proceedings of the 4th International Conference on Big Data and Internet of Things, (1-5)
- Azpeitia A and Etchegoyhen T (2019). Efficient document alignment across scenarios, Machine Translation, 33:3, (205-237), Online publication date: 1-Sep-2019.
- Fan H, Wang J, Zhuang B, Wang S and Xiao J Automatic Acrostic Couplet Generation with Three-Stage Neural Network Pipelines PRICAI 2019: Trends in Artificial Intelligence, (314-324)
- Chen M, Lee B, Bansal G, Cao Y, Zhang S, Lu J, Tsay J, Wang Y, Dai A, Chen Z, Sohn T and Wu Y Gmail Smart Compose Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2287-2295)
- Chinea-Rios M, Sanchis-Trilles G and Casacuberta F (2019). Vector sentences representation for data selection in statisticalmachine translation, Computer Speech and Language, 56:C, (1-16), Online publication date: 1-Jul-2019.
- Khan Jadoon N, Anwar W, Bajwa U and Ahmad F (2019). Statistical machine translation of Indian languages: a survey, Neural Computing and Applications, 31:7, (2455-2467), Online publication date: 1-Jul-2019.
- Marzouk S and Hansen-Schirra S (2019). Evaluation of the impact of controlled language on neural machine translation compared to other MT architectures, Machine Translation, 33:1-2, (179-203), Online publication date: 1-Jun-2019.
- Calixto I and Liu Q (2019). An error analysis for image-based multi-modal neural machine translation, Machine Translation, 33:1-2, (155-177), Online publication date: 1-Jun-2019.
- Rahman M, Palani D and Rigby P Natural software revisited Proceedings of the 41st International Conference on Software Engineering, (37-48)
- Tran N, Tran H, Nguyen S, Nguyen H and Nguyen T Does BLEU score work for code migration? Proceedings of the 27th International Conference on Program Comprehension, (165-176)
- Ruder S, Vulić I and Søgaard A (2019). A survey of cross-lingual word embedding models, Journal of Artificial Intelligence Research, 65:1, (569-630), Online publication date: 1-May-2019.
- Ostaszewski M, Miszczak J, Banchi L and Sadowski P (2019). Approximation of quantum control correction scheme using deep neural networks, Quantum Information Processing, 18:5, (1-13), Online publication date: 1-May-2019.
- Kinghorn P, Zhang L and Shao L (2019). A hierarchical and regional deep learning architecture for image description generation, Pattern Recognition Letters, 119:C, (77-85), Online publication date: 1-Mar-2019.
- Schluter R, Beck E and Ney H (2019). Upper and Lower Tight Error Bounds for Feature Omission with an Extension to Context Reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence, 41:2, (502-514), Online publication date: 1-Feb-2019.
- Xia M, Huang G, Liu L and Shi S Graph based translation memory for neural machine translation Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, (7297-7304)
- Jain P, Mishra A, Azad A and Sankaranarayanan K Unsupervised controllable text formalization Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, (6554-6561)
- Mauă?Ec M and Brest J (2019). Slavic languages in phrase-based statistical machine translation, Artificial Intelligence Review, 51:1, (77-117), Online publication date: 1-Jan-2019.
- Coughlin R, Setthawong R and Setthawong P An Improved English-Thai Translation Framework for Non-timing Aligned Parallel Corpora Using Bleualign with Explicit Feedback Proceedings of the 10th International Conference on Advances in Information Technology, (1-8)
- Anderson P, Gould S and Johnson M Partially-supervised image captioning Proceedings of the 32nd International Conference on Neural Information Processing Systems, (1879-1890)
- Barmpoutis A Learning Programming Languages as Shortcuts to Natural Language Token Replacements Proceedings of the 18th Koli Calling International Conference on Computing Education Research, (1-10)
- He P, Chen Z, He S and Lyu M Characterizing the natural language descriptions in software logging statements Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, (178-189)
- Chen M Stochastic Gradient Descent Combines Second-Order Information for Training Neural Network Proceedings of the 2018 1st International Conference on Mathematics and Statistics, (69-73)
- Yin P, Deng B, Chen E, Vasilescu B and Neubig G Learning to mine aligned code and natural language pairs from stack overflow Proceedings of the 15th International Conference on Mining Software Repositories, (476-486)
- Phan H, Nguyen H, Tran N, Truong L, Nguyen A and Nguyen T Statistical learning of API fully qualified names in code snippets of online forums Proceedings of the 40th International Conference on Software Engineering, (632-642)
- Munigala V, Mishra A, Tamilselvam S, Khare S, Dasgupta R and Sankaran A PersuAIDE ! An Adaptive Persuasive Text Generation System for Fashion Domain Companion Proceedings of the The Web Conference 2018, (335-342)
- Fujita A and Isabelle P (2018). Expanding Paraphrase Lexicons by Exploiting Generalities, ACM Transactions on Asian and Low-Resource Language Information Processing, 17:2, (1-36), Online publication date: 5-Feb-2018.
- Zhou Q, Yang N, Wei F and Zhou M Sequential copying networks Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, (4987-4994)
- Grami G, Alkazemi B, Nour M, Naseer A and Al-Doobi H A Proposed Model to Address Current Errors in English into Arabic Machine Translation Proceedings of the 3rd International Conference on Robotics and Artificial Intelligence, (116-120)
- Chen C, Xing Z and Liu Y (2017). By the Community & For the Community, Proceedings of the ACM on Human-Computer Interaction, 1:CSCW, (1-21), Online publication date: 6-Dec-2017.
- Revanuru K, Turlapaty K and Rao S Neural Machine Translation of Indian Languages Proceedings of the 10th Annual ACM India Compute Conference, (11-20)
- Kazemi A, Toral A, Way A, Monadjemi A and Nematbakhsh M (2017). Syntax- and semantic-based reordering in hierarchical phrase-based statistical machine translation, Expert Systems with Applications: An International Journal, 84:C, (186-199), Online publication date: 30-Oct-2017.
- Liu L, Fujita A, Utiyama M, Finch A, Sumita E, Lemao Liu , Fujita A, Utiyama M, Finch A and Sumita E (2017). Translation Quality Estimation Using Only Bilingual Corpora, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:9, (1762-1772), Online publication date: 1-Sep-2017.
- Gulcehre C, Firat O, Xu K, Cho K and Bengio Y (2017). On integrating a language model into neural machine translation, Computer Speech and Language, 45:C, (137-148), Online publication date: 1-Sep-2017.
- Dauphin Y, Fan A, Auli M and Grangier D Language modeling with gated convolutional networks Proceedings of the 34th International Conference on Machine Learning - Volume 70, (933-941)
- Phan H, Nguyen H, Nguyen T and Rajan H Statistical learning for inference between implementations and documentation Proceedings of the 39th International Conference on Software Engineering: New Ideas and Emerging Results Track, (27-30)
- Phan H, Nguyen A, Nguyen T and Nguyen T Statistical migration of API usages Proceedings of the 39th International Conference on Software Engineering Companion, (47-50)
- Kim K, Park E, Shin J, Kwon O and Kim Y (2017). Divergence-based fine pruning of phrase-based statistical translation model, Computer Speech and Language, 41:C, (146-160), Online publication date: 1-Jan-2017.
- Poirier É Meaning-based content word alignment heuristic Proceedings of the 8th International Conference on Management of Digital EcoSystems, (208-214)
- Nguyen T, Rigby P, Nguyen A, Karanfil M and Nguyen T T2API: synthesizing API code usage templates from English texts with statistical translation Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, (1013-1017)
- Nguyen T Code migration with statistical machine translation Proceedings of the 5th International Workshop on Software Mining, (2-2)
- Saha Roy R, Agarwal S, Ganguly N and Choudhury M (2016). Syntactic complexity of Web search queries through the lenses of language models, networks and users, Information Processing and Management: an International Journal, 52:5, (923-948), Online publication date: 1-Sep-2016.
- Maletti A Compositions of Tree-to-Tree Statistical Machine Translation Models Proceedings of the 20th International Conference on Developments in Language Theory - Volume 9840, (293-305)
- Hindle A, Barr E, Gabel M, Su Z and Devanbu P (2016). On the naturalness of software, Communications of the ACM, 59:5, (122-131), Online publication date: 26-Apr-2016.
- Abdul-Rauf S, Schwenk H, Lambert P and Nawaz M (2016). Empirical use of information retrieval to build synthetic data for SMT domain adaptation, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:4, (745-754), Online publication date: 1-Apr-2016.
- Chu C, Nakazawa T and Kurohashi S (2015). Integrated Parallel Sentence and Fragment Extraction from Comparable Corpora, ACM Transactions on Asian and Low-Resource Language Information Processing, 15:2, (1-22), Online publication date: 1-Feb-2016.
- Bentivogli L, Bertoldi N, Cettolo M, Federico M, Negri M and Turchi M (2016). On the evaluation of adaptive machine translation for human post-editing, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:2, (388-399), Online publication date: 1-Feb-2016.
- Nguyen A, Nguyen T and Nguyen T Divide-and-conquer approach for multi-phase statistical migration for source code Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, (585-596)
- Oda Y, Fudaba H, Neubig G, Hata H, Sakti S, Toda T and Nakamura S Learning to generate pseudo-code from source code using statistical machine translation Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, (574-584)
- Fudaba H, Oda Y, Akabe K, Neubig G, Hata H, Sakti S, Toda T and Nakamura S Pseudogen Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, (824-829)
- Sordoni A, Bengio Y, Vahabi H, Lioma C, Grue Simonsen J and Nie J A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, (553-562)
- Wołk K and Marasek K Tuned and GPU-Accelerated Parallel Data Mining from Comparable Corpora Proceedings of the 18th International Conference on Text, Speech, and Dialogue - Volume 9302, (32-40)
- Guo J, Liu J, Chen X, Han Q and Zhou K Tunable Discounting Mechanisms for Language Modeling Revised Selected Papers, Part II, of the 5th International Conference on Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques - Volume 9243, (585-594)
- Liu X, Duh K and Matsumoto Y (2015). Multilingual Topic Models for Bilingual Dictionary Extraction, ACM Transactions on Asian and Low-Resource Language Information Processing, 14:3, (1-22), Online publication date: 12-Jun-2015.
- White M, Vendome C, Linares-Vásquez M and Poshyvanyk D Toward deep learning software repositories Proceedings of the 12th Working Conference on Mining Software Repositories, (334-345)
- White M Deep representations for software engineering Proceedings of the 37th International Conference on Software Engineering - Volume 2, (781-783)
- Tambouratzis G (2015). Conditional random fields versus template-matching in MT phrasing tasks involving sparse training data, Pattern Recognition Letters, 53:C, (44-52), Online publication date: 1-Feb-2015.
- Turner A, Brownstein M, Cole K, Karasz H and Kirchhoff K (2015). Modeling workflow to design machine translation applications for public health practice, Journal of Biomedical Informatics, 53:C, (136-146), Online publication date: 1-Feb-2015.
- Piqueras S, Del-Agua M, Giménez A, Civera J and Juan A Statistical Text-to-Speech Synthesis of Spanish Subtitles Proceedings of the Second International Conference on Advances in Speech and Language Technologies for Iberian Languages - Volume 8854, (40-48)
- Bredin H, Roy A, Pécheux N and Allauzen A "Sheldon speaking, Bonjour!" Proceedings of the 22nd ACM international conference on Multimedia, (137-146)
- Ture F and Lin J (2014). Exploiting Representations from Statistical Machine Translation for Cross-Language Information Retrieval, ACM Transactions on Information Systems, 32:4, (1-32), Online publication date: 28-Oct-2014.
- Karaivanov S, Raychev V and Vechev M Phrase-Based Statistical Translation of Programming Languages Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, (173-184)
- Green S, Chuang J, Heer J and Manning C Predictive translation memory Proceedings of the 27th annual ACM symposium on User interface software and technology, (177-187)
- Nguyen A, Nguyen H, Nguyen T and Nguyen T Statistical learning approach for mining API usage mappings for code migration Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, (457-468)
- Igarashi T TransDocument Proceedings of the 5th ACM international conference on Collaboration across boundaries: culture, distance & technology, (53-62)
- Sokolov A, Hieber F and Riezler S Learning to translate queries for CLIR Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, (1179-1182)
- Nguyen A, Nguyen H, Nguyen T and Nguyen T Statistical learning of API mappings for language migration Companion Proceedings of the 36th International Conference on Software Engineering, (618-619)
- Nguyen A, Nguyen T and Nguyen T Migrating code with statistical machine translation Companion Proceedings of the 36th International Conference on Software Engineering, (544-547)
- Tiedemann J Improved Text Extraction from PDF Documents for Large-Scale Natural Language Processing Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 8403, (102-112)
- Lohar P, Bhaskar P, Pal S and Bandyopadhyay S Cross Lingual Snippet Generation Using Snippet Translation System Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 8404, (331-342)
- Alabau V, Sanchis A and Casacuberta F (2014). Improving on-line handwritten recognition in interactive machine translation, Pattern Recognition, 47:3, (1217-1228), Online publication date: 1-Mar-2014.
- Sokolov A, Wisniewski G and Yvon F (2014). Lattice BLEU oracles in machine translation, ACM Transactions on Speech and Language Processing , 10:4, (1-29), Online publication date: 1-Dec-2013.
- Nguyen A, Nguyen T and Nguyen T Lexical statistical machine translation for language migration Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, (651-654)
- Sudoh K, Wu X, Duh K, Tsukada H and Nagata M (2013). Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation, ACM Transactions on Asian Language Information Processing, 12:3, (1-15), Online publication date: 1-Aug-2013.
- Madnani N and Dorr B (2013). Generating targeted paraphrases for improved translation, ACM Transactions on Intelligent Systems and Technology, 4:3, (1-25), Online publication date: 1-Jun-2013.
- Brkić M, Seljan S and Vičić T Automatic and human evaluation on english-croatian legislative test set Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2, (311-317)
- Guo J, Liu J, Walsh M and Schmid H Class-Based language models for chinese-english parallel corpus Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2, (264-275)
- Pinnis M, Skadiņa I and Vasiļjevs A Domain adaptation in statistical machine translation using comparable corpora Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2, (224-235)
- Rychtyckyj N and Plesco C (2013). Applying Automated Language Translation at a Global Enterprise Level, AI Magazine, 34:1, (43-54), Online publication date: 1-Mar-2013.
- LóPez-LudeñA V, San-Segundo R, GonzáLez Morcillo C, LóPez J and Pardo MuñOz J (2013). Increasing adaptability of a speech into sign language translation system, Expert Systems with Applications: An International Journal, 40:4, (1312-1322), Online publication date: 1-Mar-2013.
- Xiao T, Zhu J and Liu T (2013). Bagging and Boosting statistical machine translation systems, Artificial Intelligence, 195, (496-527), Online publication date: 1-Feb-2013.
- Wróblewska A and Przepiórkowski A Induction of dependency structures based on weighted projection Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I, (364-374)
- Isozaki H, Sudoh K, Tsukada H and Duh K (2012). HPSG-Based Preprocessing for English-to-Japanese Translation, ACM Transactions on Asian Language Information Processing, 11:3, (1-16), Online publication date: 1-Sep-2012.
- Büchse M, Maletti A and Vogler H Unidirectional derivation semantics for synchronous tree-adjoining grammars Proceedings of the 16th international conference on Developments in Language Theory, (368-379)
- Sofianopoulos S, Vassiliou M and Tambouratzis G Implementing a language-independent MT methodology Proceedings of the First Workshop on Multilingual Modeling, (1-10)
- Balahur A and Turchi M Multilingual sentiment analysis using machine translation? Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis, (52-60)
- Wang R, Osenova P and Simov K Linguistically-enriched models for Bulgarian-to-English machine translation Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation, (10-19)
- Fujita A, Isabelle P and Kuhn R Enlarging paraphrase collections through generalization and instantiation Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, (631-642)
- Pinnis M, Ion R, Ştefănescu D, Su F, Skadiņa I, Vasiļjevs A and Babych B ACCURAT toolkit for multi-level alignment and information extraction from comparable corpora Proceedings of the ACL 2012 System Demonstrations, (91-96)
- Ganitkevitch J, Van Durme B and Callison-Burch C Monolingual distributional similarity for text-to-text generation Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, (256-264)
- Maletti A Every sensible extended top-down tree transducer is a multi bottom-up tree transducer Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (263-273)
- Hindle A, Barr E, Su Z, Gabel M and Devanbu P On the naturalness of software Proceedings of the 34th International Conference on Software Engineering, (837-847)
- Mayer T and Cysouw M Language comparison through sparse multilingual word alignment Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH, (54-62)
- Vu Hoang C and Aw A An unsupervised and data-driven approach for spell checking in Vietnamese OCR-scanned texts Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data, (36-44)
- Wang R, Osenova P and Simov K Linguistically-augmented Bulgarian-to-English statistical machine translation model Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra), (119-128)
- Dandapat S, Morrissey S, Way A and van Genabith J Combining EBMT, SMT, TM and IR technologies for quality and scale Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra), (48-58)
- Harriehausen-Mühlbauer B and Heuss T Semantic web based machine translation Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra), (1-9)
- Nikoulina V, Kovachev B, Lagos N and Monz C Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, (109-119)
- Martzoukos S and Monz C Power-law distributions for paraphrases extracted from bilingual corpora Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, (2-11)
- Okita T and van Genabith J Minimum bayes risk decoding with enlarged hypothesis space in system combination Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II, (40-51)
- Carpineto C and Romano G (2012). A Survey of Automatic Query Expansion in Information Retrieval, ACM Computing Surveys, 44:1, (1-50), Online publication date: 1-Jan-2012.
- Xiao T, Zhu J and Zhu M (2011). Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars, ACM Transactions on Asian Language Information Processing, 10:4, (1-29), Online publication date: 1-Dec-2011.
- Maletti A Tree transformations and dependencies Proceedings of the 12th biennial conference on The mathematics of language, (1-20)
- Greengard S (2011). Life, translated, Communications of the ACM, 54:8, (19-21), Online publication date: 1-Aug-2011.
- Galanis D and Androutsopoulos I A new sentence compression dataset and its use in an abstractive generate-and-rank sentence compressor Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop, (1-11)
- López-Ludeña V, San-Segundo R, Lutfi S, Lucas-Cuesta J, Echevarry J and Martínez-González B Source language categorization for improving a speech into sign language translation system Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies, (84-93)
- Sánchez-Cartagena V, Sánchez-Martínez F and Pérez-Ortiz J The Universitat d'Alacant hybrid machine translation system for WMT 2011 Proceedings of the Sixth Workshop on Statistical Machine Translation, (457-463)
- López-Ludeña V and San-Segundo R UPM system for the translation task Proceedings of the Sixth Workshop on Statistical Machine Translation, (420-425)
- Zhang Y and Clark S Syntax-based grammaticality improvement using CCG and guided search Proceedings of the Conference on Empirical Methods in Natural Language Processing, (1147-1157)
- Malakasiotis P and Androutsopoulos I A generate and rank approach to sentence paraphrasing Proceedings of the Conference on Empirical Methods in Natural Language Processing, (96-106)
- Ture F, Elsayed T and Lin J No free lunch Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, (943-952)
- Na S and Ng H Enriching document representation via translation for improved monolingual information retrieval Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, (853-862)
- McCrae J, Espinoza M, Montiel-Ponsoda E, Aguado-de-Cea G and Cimiano P Combining statistical and semantic approaches to the translation of ontologies and taxonomies Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation, (116-125)
- Attardi G, Chanev A and Miceli Barone A A dependency based statistical translation model Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation, (79-87)
- Liu Z, Chen X, Zheng Y and Sun M Automatic keyphrase extraction by bridging vocabulary gap Proceedings of the Fifteenth Conference on Computational Natural Language Learning, (135-144)
- Schwartz L, Callison-Burch C, Schuler W and Wu S Incremental syntactic language models for phrase-based translation Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, (620-631)
- Ravi S and Knight K Deciphering foreign language Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, (12-21)
- Silvestre-Cerdà J, Andrés-Ferrer J and Civera J Explicit length modelling for statistical machine translation Proceedings of the 5th Iberian conference on Pattern recognition and image analysis, (273-280)
- Maletti A (2011). Survey: Weighted Extended Top-down Tree Transducers Part II—Application in Machine Translation, Fundamenta Informaticae, 112:2-3, (239-261), Online publication date: 1-Apr-2011.
- Fülöp Z, Maletti A and Vogler H (2011). Weighted Extended Tree Transducers, Fundamenta Informaticae, 111:2, (163-202), Online publication date: 1-Apr-2011.
- Lagoutte A and Maletti A Survey Algebraic Foundations in Computer Science, (272-308)
- Son L, Allauzen A, Wisniewski G and Yvon F Training continuous space language models Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, (778-788)
- de Gispert A, Pino J and Byrne W Hierarchical phrase-based translation grammars extracted from alignment posterior probabilities Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, (545-554)
- Kuhn R, Chen B, Foster G and Stratford E Phrase clustering for smoothing TM probabilities Proceedings of the 23rd International Conference on Computational Linguistics, (608-616)
- Isozaki H, Sudoh K, Tsukada H and Duh K Head finalization Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, (244-251)
- Abney S and Bird S The human language project Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, (88-97)
- Lopez A (2008). Statistical machine translation, ACM Computing Surveys, 40:3, (1-49), Online publication date: 1-Aug-2008.
- Liebeskind C, Liebeskind S and Bouhnik D Machine Translation for Historical Research: A case study of Aramaic-Ancient Hebrew Translations, Journal on Computing and Cultural Heritage , 0:0
- Tambouratzis G Applying PSO to natural language processing tasks: Optimizing the identification of syntactic phrases 2016 IEEE Congress on Evolutionary Computation (CEC), (1831-1838)
Index Terms
- Statistical Machine Translation
Recommendations
N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational LinguisticsIn this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven ...
Linguistically annotated BTG for statistical machine translation
COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys ...
Dependency-Based Chinese-English Statistical Machine Translation
CICLing '07: Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text ProcessingWe present a Chinese-English Statistical Machine Translation (SMT) system based on dependency tree mappings. We use a state-of-the-art dependency parser to parse the English translation of the Penn Chinese Treebank to make it bilingual and then learn a ...