Low-Resource Machine Transliteration Using Recurrent Neural Networks

Published: 16 January 2019


Grapheme-to-phoneme models are key components in automatic speech recognition and text-to-speech systems. With low-resource language pairs that do not have available and well-developed pronunciation lexicons, grapheme-to-phoneme models are particularly useful. These models are based on initial alignments between grapheme source and phoneme target sequences. Inspired by sequence-to-sequence recurrent neural network--based translation methods, the current research presents an approach that applies an alignment representation for input sequences and pretrained source and target embeddings to overcome the transliteration problem for a low-resource languages pair. Evaluation and experiments involving French and Vietnamese showed that with only a small bilingual pronunciation dictionary available for training the transliteration models, promising results were obtained with a large increase in BLEU scores and a reduction in Translation Error Rate (TER) and Phoneme Error Rate (PER). Moreover, we compared our proposed neural network--based transliteration approach with a statistical one.


Index Terms

  1. Low-Resource Machine Transliteration Using Recurrent Neural Networks



    Published In

    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 18, Issue 2
    June 2019
    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 January 2019
    Accepted: 01 August 2018
    Revised: 01 May 2018
    Received: 01 February 2018
    Published in TALLIP Volume 18, Issue 2


    Author Tags

    1. French-Vietnamese
    2. Machine transliteration
    3. alignment
    4. embeddings
    5. grapheme-to-phoneme
    6. low-resource language
    7. recurrent neural networks


