Speeding Up Target-Language Driven Part-of-Speech Tagger Training for Machine Translation

Sánchez-Martínez, Felipe; Pérez-Ortiz, Juan Antonio; Forcada, Mikel L.

doi:10.1007/11925231_81

Felipe Sánchez-Martínez²⁰,
Juan Antonio Pérez-Ortiz²⁰ &
Mikel L. Forcada²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4293))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

983 Accesses
2 Citations

Abstract

When training hidden-Markov-model-based part-of-speech (PoS) taggers involved in machine translation systems in an unsupervised manner the use of target-language information has proven to give better results than the standard Baum-Welch algorithm. The target-language-driven training algorithm proceeds by translating every possible PoS tag sequence resulting from the disambiguation of the words in each source-language text segment into the target language, and using a target-language model to estimate the likelihood of the translation of each possible disambiguation. The main disadvantage of this method is that the number of translations to perform grows exponentially with segment length, translation being the most time-consuming task. In this paper, we present a method that uses a priori knowledge obtained in an unsupervised manner to prune unlikely disambiguations in each text segment, so that the number of translations to be performed during training is reduced. The experimental results show that this new pruning method drastically reduces the amount of translations done during training (and, consequently, the time complexity of the algorithm) without degrading the tagging accuracy achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 239.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Improving Machine Translation Using Parts-Of-Speech Tags and Dependency Parsing

Pointwise Prediction and Sequence-Based Reranking for Adaptable Part-of-Speech Tagging

A Trigram HMM-Based POS Tagger for Indian Languages

References

Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Baum, L.E.: An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities 3, 1–8 (1972)
Google Scholar
Sánchez-Martínez, F., Pérez-Ortiz, J.A., Forcada, M.L.: Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 137–148. Springer, Heidelberg (2004)
Chapter Google Scholar
Corbí-Bellot, A.M., Forcada, M.L., Ortiz-Rojas, S., Pérez-Ortiz, J.A., Ramírez-Sánchez, G., Sánchez-Martínez, F., Alegria, I., Mayor, A., Sarasola, K.: An open-source shallow-transfer machine translation engine for the Romance languages of Spain. In: Proceedings of the 10th European Associtation for Machine Translation Conference, Budapest, Hungary, pp. 79–86 (2005)
Google Scholar
Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A practical part-of-speech tagger. In: Third Conference on Applied Natural Language Processing. Association for Computational Linguistics. Proceedings of the Conference, Trento, Italia, pp. 133–140 (1992)
Google Scholar
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Gale, W.A., Church, K.W.: Poor estimates of context are worse than none. In: Proceedings of a workshop on Speech and natural language, pp. 283–287. Morgan Kaufmann, San Francisco (1990)
Chapter Google Scholar
Armentano-Oller, C., Carrasco, R.C., CorbÍ-Bellot, A.M., Forcada, M.L., Ginestí-Rosell, M., Ortiz-Rojas, S., Pérez-Ortiz, J.A., Ramírez-Sánchez, G., Sánchez-Martínez, F., Scalco, M.A.: Open-source Portuguese-Spanish machine translation. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 50–59. Springer, Heidelberg (2006)
Chapter Google Scholar
Kupiec, J.: Robust part-of-speech tagging using a hidden Markov model. Computer Speech and Language 6(3), 225–242 (1992)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Transducens Group – Departament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03071, Alacant, Spain
Felipe Sánchez-Martínez, Juan Antonio Pérez-Ortiz & Mikel L. Forcada

Authors

Felipe Sánchez-Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Juan Antonio Pérez-Ortiz
View author publications
You can also search for this author in PubMed Google Scholar
Mikel L. Forcada
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, 07738, Mexico City, México
Alexander Gelbukh
Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Luis Enrique Erro No. 1, Sta. Ma. Tonanzintla, 72840, Puebla, México
Carlos Alberto Reyes-Garcia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sánchez-Martínez, F., Pérez-Ortiz, J.A., Forcada, M.L. (2006). Speeding Up Target-Language Driven Part-of-Speech Tagger Training for Machine Translation. In: Gelbukh, A., Reyes-Garcia, C.A. (eds) MICAI 2006: Advances in Artificial Intelligence. MICAI 2006. Lecture Notes in Computer Science(), vol 4293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11925231_81

Download citation

DOI: https://doi.org/10.1007/11925231_81
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49026-5
Online ISBN: 978-3-540-49058-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Speeding Up Target-Language Driven Part-of-Speech Tagger Training for Machine Translation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Improving Machine Translation Using Parts-Of-Speech Tags and Dependency Parsing

Pointwise Prediction and Sequence-Based Reranking for Adaptable Part-of-Speech Tagging

A Trigram HMM-Based POS Tagger for Indian Languages

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Speeding Up Target-Language Driven Part-of-Speech Tagger Training for Machine Translation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Improving Machine Translation Using Parts-Of-Speech Tags and Dependency Parsing

Pointwise Prediction and Sequence-Based Reranking for Adaptable Part-of-Speech Tagging

A Trigram HMM-Based POS Tagger for Indian Languages

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation