Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.3115/1218955.1218976dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

A joint source-channel model for machine transliteration

Published: 21 July 2004 Publication History

Abstract

Most foreign names are transliterated into Chinese, Japanese or Korean with approximate phonetic equivalents. The transliteration is usually achieved through intermediate phonemic mapping. This paper presents a new framework that allows direct orthographical mapping (DOM) between two different languages, through a joint source-channel model, also called n-gram transliteration model (TM). With the n-gram TM model, we automate the orthographic alignment process to derive the aligned transliteration units from a bilingual dictionary. The n-gram TM under the DOM framework greatly reduces system development effort and provides a quantum leap in improvement in transliteration accuracy over that of other state-of-the-art machine learning algorithms. The modeling framework is validated through several experiments for English-Chinese language pair.

References

[1]
Dempster, A. P., N. M. Laird and D. B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc., Ser. B. Vol. 39, pp 138
[2]
Helen M. Meng, Wai-Kit Lo, Berlin Chen and Karen Tang. 2001. Generate Phonetic Cognates to Handle Name Entities in English-Chinese cross-language spoken document retrieval, ASRU 2001
[3]
Jelinek, F. 1991, Self-organized language modeling for speech recognition, In Waibel, A. and Lee K. F. (eds), Readings in Speech Recognition, Morgan Kaufmann., San Mateo, CA
[4]
K. Knight and J. Graehl. 1998. Machine Transliteration, Computational Linguistics 24(4)
[5]
Paola Virga, Sanjeev Khudanpur, 2003. Transliteration of Proper Names in Cross-lingual Information Retrieval. ACL 2003 workshop MLNER
[6]
Quinlan J. R. 1993, C4.5 Programs for machine learning, Morgan Kaufmann, San Mateo, CA
[7]
Rabiner, Lawrence R. 1989, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE 77(2)
[8]
Schwartz, R. and Chow Y. L., 1990, The N-best algorithm: An efficient and Exact procedure for finding the N most likely sentence hypothesis, Proceedings of ICASSP 1990, Albuquerque, pp 81--84
[9]
Sung Young Jung, Sung Lim Hong and Eunok Paek, 2000, An English to Korean Transliteration Model of Extended Markov Window, Proceedings of COLING
[10]
The Onomastica Consortium, 1995. The Onomastica interlanguage pronunciation lexicon, Proceedings of EuroSpeech, Madrid, Spain, Vol. 1, pp829--832
[11]
Xinhua News Agency, 1992, Chinese transliteration of foreign personal names, The Commercial Press

Cited By

View all
  • (2015)A Dirichlet process mixture based name origin clustering and alignment model for transliterationAdvances in Artificial Intelligence10.1155/2015/9270632015(4-4)Online publication date: 1-Jan-2015
  • (2014)Machine Learning Approach for Language Identification & TransliterationProceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/2824864.2824877(60-64)Online publication date: 5-Dec-2014
  • (2014)A Hybrid Approach for Transliterated Word-Level Language IdentificationProceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/2824864.2824876(54-59)Online publication date: 5-Dec-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
July 2004
729 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 21 July 2004

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)47
  • Downloads (Last 6 weeks)10
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2015)A Dirichlet process mixture based name origin clustering and alignment model for transliterationAdvances in Artificial Intelligence10.1155/2015/9270632015(4-4)Online publication date: 1-Jan-2015
  • (2014)Machine Learning Approach for Language Identification & TransliterationProceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/2824864.2824877(60-64)Online publication date: 5-Dec-2014
  • (2014)A Hybrid Approach for Transliterated Word-Level Language IdentificationProceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/2824864.2824876(54-59)Online publication date: 5-Dec-2014
  • (2013)A Bayesian Alignment Approach to Transliteration MiningACM Transactions on Asian Language Information Processing10.1145/2499955.249995712:3(1-22)Online publication date: 1-Aug-2013
  • (2013)MDL-based models for transliteration generationProceedings of the First international conference on Statistical Language and Speech Processing10.1007/978-3-642-39593-2_18(200-211)Online publication date: 29-Jul-2013
  • (2013)A Joint Source Channel Model for the English to Bengali Back TransliterationProceedings of the First International Conference on Mining Intelligence and Knowledge Exploration - Volume 828410.1007/978-3-319-03844-5_73(751-760)Online publication date: 18-Dec-2013
  • (2012)Cost-benefit analysis of two-stage conditional random fields based English-to-Chinese machine transliterationProceedings of the 4th Named Entity Workshop10.5555/2392777.2392792(76-80)Online publication date: 12-Jul-2012
  • (2012)Syllable-based machine transliteration with extra phrase featuresProceedings of the 4th Named Entity Workshop10.5555/2392777.2392786(52-56)Online publication date: 12-Jul-2012
  • (2012)Latent semantic transliteration using dirichlet mixtureProceedings of the 4th Named Entity Workshop10.5555/2392777.2392782(30-37)Online publication date: 12-Jul-2012
  • (2012)Report of NEWS 2012 machine transliteration shared taskProceedings of the 4th Named Entity Workshop10.5555/2392777.2392779(10-20)Online publication date: 12-Jul-2012
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media