Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition

Guo, Pengcheng; Xu, Haihua; Xie, Lei; Chng, Eng Siong

Computer Science > Computation and Language

arXiv:1806.06200 (cs)

[Submitted on 16 Jun 2018]

Title:Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition

Authors:Pengcheng Guo, Haihua Xu, Lei Xie, Eng Siong Chng

View PDF

Abstract:In this paper, we present our overall efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data. We first investigate semi-supervised lexicon learning approach to adapt the canonical lexicon, which is meant to alleviate the heavily accented pronunciation issue within the code-switching conversation of the local area. As a result, the learned lexicon yields improved performance. Furthermore, we attempt to use semi-supervised training to deal with those transcriptions that are highly mismatched between human transcribers and ASR system. Specifically, we conduct semi-supervised training assuming those poorly transcribed data as unsupervised data. We found the semi-supervised acoustic modeling can lead to improved results. Finally, to make up for the limitation of the conventional n-gram language models due to data sparsity issue, we perform lattice rescoring using neural network language models, and significant WER reduction is obtained.

Comments:	5pages, 3 figures, INTERSPEECH 2018
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1806.06200 [cs.CL]
	(or arXiv:1806.06200v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1806.06200

Submission history

From: Lei Xie [view email]
[v1] Sat, 16 Jun 2018 07:18:22 UTC (40 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2018-06

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Pengcheng Guo
Haihua Xu
Lei Xie
Eng Siong Chng

export BibTeX citation

Computer Science > Computation and Language

Title:Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators