BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

Liang, Yuhao; Yu, Fan; Li, Yangze; Guo, Pengcheng; Zhang, Shiliang; Chen, Qian; Xie, Lei

Computer Science > Sound

arXiv:2305.13716 (cs)

[Submitted on 23 May 2023 (v1), last revised 5 Oct 2023 (this version, v3)]

Title:BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

Authors:Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie

View PDF

Abstract:The recently proposed serialized output training (SOT) simplifies multi-talker automatic speech recognition (ASR) by generating speaker transcriptions separated by a special token. However, frequent speaker changes can make speaker change prediction difficult. To address this, we propose boundary-aware serialized output training (BA-SOT), which explicitly incorporates boundary knowledge into the decoder via a speaker change detection task and boundary constraint loss. We also introduce a two-stage connectionist temporal classification (CTC) strategy that incorporates token-level SOT CTC to restore temporal context information. Besides typical character error rate (CER), we introduce utterance-dependent character error rate (UD-CER) to further measure the precision of speaker change prediction. Compared to original SOT, BA-SOT reduces CER/UD-CER by 5.1%/14.0%, and leveraging a pre-trained ASR model for BA-SOT model initialization further reduces CER/UD-CER by 8.4%/19.9%.

Comments:	Accepted by INTERSPEECH 2023
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2305.13716 [cs.SD]
	(or arXiv:2305.13716v3 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2305.13716

Submission history

From: Yuhao Liang [view email]
[v1] Tue, 23 May 2023 06:08:13 UTC (319 KB)
[v2] Tue, 30 May 2023 13:45:08 UTC (320 KB)
[v3] Thu, 5 Oct 2023 11:44:39 UTC (870 KB)

Computer Science > Sound

Title:BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators