Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
×
Aug 27, 2021 · In this paper, we propose to jointly learn representations during pretraining from two different modalities: speech and text.
Unspoken text is complementary to un-transcribed speech in self-supervised learning. It is also much easier to collect than un-transcribed speech. Pretraining ...
Self-supervised pretraining for Automated Speech Recognition (ASR) has shown varied degrees of success. In this paper, we propose to jointly learn ...
Aug 30, 2021 · We demonstrate that this novel pretraining method yields Word Error Rate (WER) reductions of 10% relative on the well-benchmarked, Librispeech ...
The TTS4ASR line of work [23, 24, 25] uses an approach similar to dual learning, in which a TTS model is used to provide supervision for unpaired audio before ...
Aug 31, 2021 · In this tutorial I explain the paper "Injecting Text in Self-Supervised Speech Pre-Training" by Zhehuai Chen, Yu Zhang, Andrew Rosenberg, ...
Apr 28, 2022 · transcription이 없는 speech를 사용한 프리트레이닝에 tts를 사용해서 speech가 없는 텍스트에 대한 프리트레이닝을 결합. asr도 self supervision이 ...
People also ask
Dec 14, 2023 · 5 draws conclusions. 2. RELATED WORK. Injecting text into the self-supervised speech pre-training task has been widely studied, including ...
Self-supervised learning from speech signals aims to learn the latent structure inherent in the signal, while self-supervised learning from text attempts to ...
May 22, 2022 · Abstract. We describe a method to jointly pre-train speech and text in an encoder-decoder mod- eling framework for speech translation and.