Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
×
Oct 23, 2021 · The modality switch training randomly swaps speech and text embeddings based on the forced alignment result to learn a joint representation ...
ABSTRACT. The advances in attention-based encoder-decoder (AED) networks have brought great progress to end-to-end (E2E) automatic speech recognition (ASR).
This paper proposes an embedding aligner and modality switch training to better align the speech and text latent spaces and proves its effectiveness on ...
The modality switch training randomly swaps speech and text embeddings based on the forced alignment result to learn a joint representation space. Experimental ...
... Our technique decreases Librispeech ASR WER by 14% to 19%. We also tested its influence on spoken language understanding (SLU) and saw a 2.5% to 2.8% F1 ...
People also ask
Nov 11, 2021 · OPTIMIZING ALIGNMENT OF SPEECH AND LANGUAGE. LATENT SPACES FOR END-TO-END SPEECH. RECOGNITION AND UNDERSTANDING.
Optimizing Alignment of Speech and Language Latent Spaces for End-to-End Speech Recognition and Understanding ; ASR results. Adding text encoder trained with ...
Optimizing Alignment of Speech and Language Latent Spaces for End-to-End Speech Recognition and Understanding - W Wang et al, INTERSPEECH 2022; STPT: Unified ...
Mar 20, 2023 · Qian, and Michael Zeng, “Optimizing alignment of speech and language latent spaces for end-to-end speech recognition and understanding,” in ...
Optimizing Alignment of Speech and Language Latent Spaces for End-to-End Speech Recognition and Understanding. ICASSP 2022. Zhengyang Chen, Sanyuan Chen, Yu ...