research-article

Open access

BeParrot: Efficient Interface for Transcribing Unclear Speech via Respeaking

Authors:

Riku Arakawa,

Hiromu Yakura,

Masataka GotoAuthors Info & Claims

IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces

Pages 832 - 840

https://doi.org/10.1145/3490099.3511164

Published: 22 March 2022 Publication History

All formats PDF

Abstract

Transcribing speech from audio files to text is an important task not only for exploring the audio content in text form but also for utilizing the transcribed data as a source to train speech models, such as automated speech recognition (ASR) models. A post-correction approach has been frequently employed to reduce the time cost of transcription where users edit errors in the recognition results of ASR models. However, this approach assumes clear speech and is not designed for unclear speech (such as speech with high levels of noise or reverberation), which severely degrades the accuracy of ASR and requires many manual corrections. To construct an alternative approach to transcribe unclear speech, we introduce the idea of respeaking, which has primarily been used to create captions for television programs in real time. In respeaking, a proficient human respeaker repeats the heard speech as shadowing, and their utterances are recognized by an ASR model. While this approach can be effective for transcribing unclear speech, one problem is that respeaking is a highly cognitively demanding task and extensive training is often required to become a respeaker. We address this point with BeParrot, the first interface designed for respeaking that allows novice users to benefit from respeaking without extensive training through two key features: parameter adjustment and pronunciation feedback. Our user study involving 60 crowd workers demonstrated that they could transcribe different types of unclear speech 32.2 % faster with BeParrot than with a conventional approach without losing the accuracy of transcriptions. In addition, comments from the workers supported the design of the adjustment and feedback features, exhibiting a willingness to continue using BeParrot for transcription tasks. Our work demonstrates how we can leverage recent advances in machine learning techniques to overcome the area that is still challenging for computers themselves with the help of a human-in-the-loop approach.

References

[1]

Riku Arakawa, Shinnosuke Takamichi, and Hiroshi Saruwatari. 2019. Implementation of DNN-Based Real-Time Voice Conversion and Its Improvements by Audio Data Augmentation and Mask-Shaped Device. In Proceedings of the 10th ISCA Speech Synthesis Workshop. ISCA, Grenoble, France, 93–98. https://doi.org/10.21437/ssw.2019-17

Abstract

References

Cited By

Index Terms

Recommendations

Transcription Correction Using Group Delay Processing for Continuous Speech Recognition

Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech

Effects of automated transcription quality on non-native speakers' comprehension in real-time computer-mediated communication

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations