Abstract
In this paper, we model a PersoArabic to Latin transliteration system as grapheme-to-phoneme (G2P) and word lattice methods combined with statistical machine translation (SMT). Persian is an Indo-Iranian branch of the Indo-European family of languages belonging to Arabic script-based languages. Our transliteration model is induced from a parallel corpus containing the Perso-Arabic script of a Persian book together with its Romanized transcription in Dabire. We manually aligned the sentences of this book in both scripts and used it as a parallel corpus. Our results indicate that the performance of the system is improved by adding grapheme-to-phoneme and word lattice methods for out-of-vocabulary handling task into the monotonic statistical machine transliteration system. In addition, the final performance on the test corpus shows that our system achieves comparable results with other state-of-the-art systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This dataset is free available and can be obtain by contacting the corresponding authors.
References
Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: 40th Annual Meeting of the Association for Computational Linguistics (2002)
Asghari, H., Maleki, J., Faili, H.: A probabilistic approach to Persian Ezafe recognition. In: 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014). Gothenburg, Sweden (2014)
Azab, M., Bouamor, H., Mohit, B., Oflazer, K.: Dudley north visits north London: learning when to transliterate to Arabic. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 439–444. Association for Computational Linguistics, Atlanta, Georgia (2013)
Balabantaray, R.C., Sahoo, D.: Odia transliteration engine using moses. In: Business and Information Management (ICBIM 2014) (2014)
Bhalla, D., Joshi, N., Mathue, I.: Rule based transliteration scheme for english to punjabi. Int. J. Nat. Lang. Comput. (IJNLC) 2(2), 67–73 (2013)
Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 5, 435–451 (2008)
Brown, F.P., Della, S.A., Pietra, V.J., Pietra, D., Robert Mercer, L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
Durrani, N., Sajjad, H., Hoang, H., Koehn, P.: Integrating an unsupervised transliteration model into statistical machine translation. In: 14th Conference of the European Chapter of Association for Computational Linguistics (EACL 2014), Gothenburg, Sweden (2014)
Dyer, C., Muresan, S., Resnik, P.: Generalizing word lattice translation. In: Annual Meeting of the Association for Computational Linguistics (ACL) (2008)
Farghaly, A., Shaalan, K.: Arabic natural language processing: Challenges and solutions. ACM Trans. Asian Lang. Inf. Process. (TALIP) 8(4), 14 (2009)
Habash, N.: REMOOV: a tool for online handling of out-of-vocabulary words in machine translation. In: Proceedings of the Second International Conference on Arabic Language Resources and Tools. The MEDAR Consortium, Cario, Egypt (2009)
Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp 187–197. Edinburgh, Scotland, United Kingdom (2011)
Karimi, S., Scholer, F., Turpin, A.: Collapsed consonant and vowel models: new approaches for English-Persian transliteration and back-transliteration. In: 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic (2007)
Karimi, S.: Machine transliteration of proper names between English and Persian. Ph.D. dissertation, RMIT University, Melbourne (2008)
Kashani, M.M., Joanis, E., Kuhn, R., Foster, G., Popwich, F.: Integration of an Arabic transliteration module into a statistical machine translation system. In: Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic (2007)
Kaur, V., Kaur Sarao, A., Singh, J.: Hybrid approach for Hindi to English transliteration system for proper nouns. Int. J. Comput. Sci. Inf. Technol. 5(5), 6361–6366 (2014)
Kirschenbaum, A., Wintner, S.: Lightly supervised transliteration for machine translation. In: 12th Conference of the European Chapter of the Association for Computational Linguistics(EACL), Athens, pp 433–441 (2009)
Koehn, P., Hoang, H.: Factored translation models. In: EMNLP (2007)
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Human Language Technology Conference of the NAACL, Main Conference. Association for Computational Linguistics, New York, USA (2007)
Maleki, J.: A romanized transcription for persian. In: Natural Language Processing Track (INFOS2008). Cario (2008)
Maleki, J., Ahrenberg, L.: Converting romanized persian to the arabic writing systems. In: Language Resources and Evaluation Conference (2008)
Masmoudi, A., Habash, N., Ellouze, M., Estève, Y., Belguith, L.H.: Arabic transliteration of romanized tunisian dialect text: a preliminary investigation. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 608–619. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18111-0_46
Mathur, S., Parakash Saxena, V.: Hybrid appraoch to English-Hindi name entity transliteration. In: Electrical, Electronics and Computer Science (SCEECS) IEEE Students’ Conference (2014)
Mousavi Nejad, N., Khadivi, S., Taghipour, K.: The Amirkabir machine transliteration system for NEWS 2011. In: Named Entities Workshop (2011)
Neysari, S.: A Study on Persian Orthography (in Persian). Sazmane Cap o Entesarat, Tehran (1996)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL. Morristown, NJ, USA, pp 311–318 (2002)
Sellami, R., Deffaf, F., Sadat, F., Belguith, L.H.: Improved statistical machine translation by cross-linguistic projection of named entities recognition and translation. Computación y Sistemas 19(4), 701–711 (2015)
Sequitur G2P, https://www-i6.informatik.rwth-aachen.de/web/Software/g2p.html (2008). Accessed 1 Apr 2016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Hemmati, N., Faili, H., Maleki, J. (2018). Multiple System Combination for PersoArabic-Latin Transliteration. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10762. Springer, Cham. https://doi.org/10.1007/978-3-319-77116-8_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-77116-8_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77115-1
Online ISBN: 978-3-319-77116-8
eBook Packages: Computer ScienceComputer Science (R0)