Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction

Ko, Yuka; Li, Sheng; Yang, Chao-Han Huck; Kawahara, Tatsuya

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2408.16180 (eess)

[Submitted on 29 Aug 2024 (v1), last revised 11 Oct 2024 (this version, v2)]

Title:Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction

Authors:Yuka Ko, Sheng Li, Chao-Han Huck Yang, Tatsuya Kawahara

View PDF HTML (experimental)

Abstract:With the strong representational power of large language models (LLMs), generative error correction (GER) for automatic speech recognition (ASR) aims to provide semantic and phonetic refinements to address ASR errors. This work explores how LLM-based GER can enhance and expand the capabilities of Japanese language processing, presenting the first GER benchmark for Japanese ASR with 0.9-2.6k text utterances. We also introduce a new multi-pass augmented generative error correction (MPA GER) by integrating multiple system hypotheses on the input side with corrections from multiple LLMs on the output side and then merging them. To the best of our knowledge, this is the first investigation of the use of LLMs for Japanese GER, which involves second-pass language modeling on the output transcriptions generated by the ASR system (e.g., N-best hypotheses). Our experiments demonstrated performance improvement in the proposed methods of ASR quality and generalization both in SPREDS-U1-ja and CSJ data.

Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2408.16180 [eess.AS]
	(or arXiv:2408.16180v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2408.16180

Submission history

From: Sheng Li Dr. [view email]
[v1] Thu, 29 Aug 2024 00:18:12 UTC (295 KB)
[v2] Fri, 11 Oct 2024 04:01:42 UTC (295 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators