Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

Butryna, Alena; Chu, Shan-Hui Cathy; Demirsahin, Isin; Gutkin, Alexander; Ha, Linne; He, Fei; Jansche, Martin; Johny, Cibu; Katanova, Anna; Kjartansson, Oddur; Li, Chenfang; Merkulova, Tatiana; Oo, Yin May; Pipatsrisawat, Knot; Rivera, Clara; Sarin, Supheakmungkol; de Silva, Pasindu; Sodimana, Keshan; Sproat, Richard; Wattanavekin, Theeraphol; Wibawa, Jaka Aris Eko

Computer Science > Computation and Language

arXiv:2010.06778 (cs)

[Submitted on 14 Oct 2020]

Title:Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

View PDF

Abstract:This paper presents an overview of a program designed to address the growing need for developing freely available speech resources for under-represented languages. At present we have released 38 datasets for building text-to-speech and automatic speech recognition applications for languages and dialects of South and Southeast Asia, Africa, Europe and South America. The paper describes the methodology used for developing such corpora and presents some of our findings that could benefit under-represented language communities.

Comments:	Appeared in 2019 UNESCO International Conference Language Technologies for All (LT4All): Enabling Linguistic Diversity and Multilingualism Worldwide, 4-6 December, Paris, France
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2010.06778 [cs.CL]
	(or arXiv:2010.06778v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.06778

Submission history

From: Alexander Gutkin [view email]
[v1] Wed, 14 Oct 2020 02:24:04 UTC (16 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-10

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Alexander Gutkin
Fei He
Martin Jansche
Richard Sproat

export BibTeX citation

Computer Science > Computation and Language

Title:Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators