Towards robustness to speech rate in mandarin all-syllable recognition

Chen, YiNing; Zhu, Xuan; Liu, Jia; Liu, RunSheng

doi:10.1007/BF02945464

Towards robustness to speech rate in mandarin all-syllable recognition

Published: November 2003

Volume 18, pages 756–761, (2003)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Chen YiNing¹,
Zhu Xuan¹,
Liu Jia¹ &
…
Liu RunSheng¹

46 Accesses
Explore all metrics

Abstract

In mandarin all-syllable recognition, many insert errors occur due to the influence of non-consonant syllables. Introducing the duration model into the recognition process is a direct way to lessen these errors. But that usually could not work well as expected, for the duration is sensitive to speech rate. Hence, aiming at this problem, a novel context dependent duration distribution normalized by speech rate is proposed in this paper and applied to a speech recognition system based on the frame of improved Hidden Markov Model (HMM). To realize this algorithm, the authors employ a new method to estimate the speech rate of a sentence; then compute the duration probability combined with speech rate; and finally implement this duration information in the post-processing stage. With little change in the recognition process and resource demand, the duration model is adopted efficiently in the system. The experimental results indicate that the syllable error rates decrease significantly in two different speech corpora. Especially for the insertions, the error rates reduce about sixty to eighty percent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Designing Syllable Models for an HMM Based Speech Recognition System

A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system

Article 25 September 2019

Automatic speech segmentation in syllable centric speech recognition system

Article 21 November 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

David W Carroll. Psychology of Language Third Edition, Brooks/Cole Publishing Company, 1999.
Zheng J, Franco H, Stolcke A. Rate-of-speech modeling for large vocabulary conversational speech recognition. InProc. the ISCA ITRW ASR2000, Paris, France, 2000, pp. 145–149.
Martinez F, Tapias D, Alvarez J. Towards speech rate independence, in large vocabulary continuous speech recognition. InProc. ICASSP, vol. 2, New York, NY, USA, May 12–15, 1998, pp. 725–728.
Kwon O W, Un C K. Context dependent word duration modeling for Korean connected digit recognition.Electronic Letters, 1995, 31(19): 1630–1631.
Article Google Scholar
Steve Young. Statistical modeling in continuous speech recognition. InProc. Int. Conf. Uncertainty in Artificial Intelligence, Seattle, WA, Aug. 2001, pp. 562–571.
Liu Jia, Pan S X. A new robust telephone speech recognition algorithm with the multi-model structures.Chinese Journal of Electronics, Apr. 2000, 9(2): 169–174.
Google Scholar
Wang R H. National performance assessment of speech recognition systems of Chinese. InProc. Oriental COCOSDA Workshop’99, Taipei, 1999, pp. 41–44.
Woodland P C, Gales M J F, Pye Det al. Broadcast news transcription using HTK. InProc. ICASSP’97, Los Alamitos, CA, USA, 1997, pp. 719–722.
Eric Chang, Yu Shi, Jianlai, Zhouet al. Speech lab in a box: A mandarin speech toolbox to jumpstart speech related research. InProc. Eurospeech Aalborg, Denmark, 2001, pp. 2799–2802.

Download references

Author information

Authors and Affiliations

Department of Electronic Engineering, Tsinghua University, 100084, Beijing, P.R. China
Chen YiNing, Zhu Xuan, Liu Jia & Liu RunSheng

Authors

Chen YiNing
View author publications
You can also search for this author in PubMed Google Scholar
Zhu Xuan
View author publications
You can also search for this author in PubMed Google Scholar
Liu Jia
View author publications
You can also search for this author in PubMed Google Scholar
Liu RunSheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen YiNing.

Additional information

The research is supported by the National Natural Science Foundation of China (Grant No. 60272016).

CHEN YiNing received the B.S. and M.S. degrees in electronic engineering from Tsinghua University in 1999 and 2001 respectively. He is a Ph.D. candidate at the Department of Electronic Engineering of Tsinghua University. His research interests are speech recognition and spoken language processing.

ZHU Xuan received the B.S. degree in electronic engineering from Beijing University of Astronautics and Aeronautics in 1998 and M.S. degree in electronic engineering from Tsinghua University in 2001. She is a Ph.D. candidate at the Department of Electronic Engineering of Tsinghua University. Her research interests are speech recognition and embedded signal processing system design.

LIU Jia received the B.S., M.S. and Ph.D. degrees in electronic engineering from Tsinghua University from 1983, 1986 and 1990 respectively, and was a post doctor in Cambridge University from 1992 to 1994. He is a professor at the Department of Electronic Engineering of Tsinghua University. His research interests are speech recognition, speech coding, speech synthesis and speech. ASIC design. Dr. Liu is a member of IEEE and a senior member of China Institute of Electronics.

LIU RunSheng received the B.S. degree in electronic engineering from Tsinghua University in 1958. He is professor at the Department of Electronic Engineering of Tsinghua University. His research interests are speech recognition, IC design and CAD.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Zhu, X., Liu, J. et al. Towards robustness to speech rate in mandarin all-syllable recognition. J. Comput. Sci. & Technol. 18, 756–761 (2003). https://doi.org/10.1007/BF02945464

Download citation

Received: 26 April 2002
Revised: 14 February 2003
Issue Date: November 2003
DOI: https://doi.org/10.1007/BF02945464

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards robustness to speech rate in mandarin all-syllable recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Designing Syllable Models for an HMM Based Speech Recognition System

A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system

Automatic speech segmentation in syllable centric speech recognition system

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now