Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Ngo, Gia H.; Nguyen, Minh; Chen, Nancy F.

doi:10.1109/TASLP.2018.2875269

Computer Science > Computation and Language

arXiv:1810.03184 (cs)

[Submitted on 7 Oct 2018]

Title:Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Authors:Gia H. Ngo, Minh Nguyen, Nancy F. Chen

View PDF

Abstract:Transliteration converts words in a source language (e.g., English) into words in a target language (e.g., Vietnamese). This conversion considers the phonological structure of the target language, as the transliterated output needs to be pronounceable in the target language. For example, a word in Vietnamese that begins with a consonant cluster is phonologically invalid and thus would be an incorrect output of a transliteration system. Most statistical transliteration approaches, albeit being widely adopted, do not explicitly model the target language's phonology, which often results in invalid outputs. The problem is compounded by the limited linguistic resources available when converting foreign words to transliterated words in the target language. In this work, we present a phonology-augmented statistical framework suitable for transliteration, especially when only limited linguistic resources are available. We propose the concept of pseudo-syllables as structures representing how segments of a foreign word are organized according to the syllables of the target language's phonology. We performed transliteration experiments on Vietnamese and Cantonese. We show that the proposed framework outperforms the statistical baseline by up to 44.68% relative, when there are limited training examples (587 entries).

Comments:	Accepted by IEEE Transactions on Audio, Speech and Language Processing. Copyright 2018 IEEE
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1810.03184 [cs.CL]
	(or arXiv:1810.03184v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1810.03184
Journal reference:	IEEE/ACM Transactions on Audio, Speech and Language Processing. 27(2019) 199-211
Related DOI:	https://doi.org/10.1109/TASLP.2018.2875269

Submission history

From: Gia Ngo [view email]
[v1] Sun, 7 Oct 2018 17:32:11 UTC (1,079 KB)

Computer Science > Computation and Language

Title:Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators