Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Towards robustness to speech rate in mandarin all-syllable recognition

  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

In mandarin all-syllable recognition, many insert errors occur due to the influence of non-consonant syllables. Introducing the duration model into the recognition process is a direct way to lessen these errors. But that usually could not work well as expected, for the duration is sensitive to speech rate. Hence, aiming at this problem, a novel context dependent duration distribution normalized by speech rate is proposed in this paper and applied to a speech recognition system based on the frame of improved Hidden Markov Model (HMM). To realize this algorithm, the authors employ a new method to estimate the speech rate of a sentence; then compute the duration probability combined with speech rate; and finally implement this duration information in the post-processing stage. With little change in the recognition process and resource demand, the duration model is adopted efficiently in the system. The experimental results indicate that the syllable error rates decrease significantly in two different speech corpora. Especially for the insertions, the error rates reduce about sixty to eighty percent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. David W Carroll. Psychology of Language Third Edition, Brooks/Cole Publishing Company, 1999.

  2. Zheng J, Franco H, Stolcke A. Rate-of-speech modeling for large vocabulary conversational speech recognition. InProc. the ISCA ITRW ASR2000, Paris, France, 2000, pp. 145–149.

  3. Martinez F, Tapias D, Alvarez J. Towards speech rate independence, in large vocabulary continuous speech recognition. InProc. ICASSP, vol. 2, New York, NY, USA, May 12–15, 1998, pp. 725–728.

  4. Kwon O W, Un C K. Context dependent word duration modeling for Korean connected digit recognition.Electronic Letters, 1995, 31(19): 1630–1631.

    Article  Google Scholar 

  5. Steve Young. Statistical modeling in continuous speech recognition. InProc. Int. Conf. Uncertainty in Artificial Intelligence, Seattle, WA, Aug. 2001, pp. 562–571.

  6. Liu Jia, Pan S X. A new robust telephone speech recognition algorithm with the multi-model structures.Chinese Journal of Electronics, Apr. 2000, 9(2): 169–174.

    Google Scholar 

  7. Wang R H. National performance assessment of speech recognition systems of Chinese. InProc. Oriental COCOSDA Workshop’99, Taipei, 1999, pp. 41–44.

  8. Woodland P C, Gales M J F, Pye Det al. Broadcast news transcription using HTK. InProc. ICASSP’97, Los Alamitos, CA, USA, 1997, pp. 719–722.

  9. Eric Chang, Yu Shi, Jianlai, Zhouet al. Speech lab in a box: A mandarin speech toolbox to jumpstart speech related research. InProc. Eurospeech Aalborg, Denmark, 2001, pp. 2799–2802.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen YiNing.

Additional information

The research is supported by the National Natural Science Foundation of China (Grant No. 60272016).

CHEN YiNing received the B.S. and M.S. degrees in electronic engineering from Tsinghua University in 1999 and 2001 respectively. He is a Ph.D. candidate at the Department of Electronic Engineering of Tsinghua University. His research interests are speech recognition and spoken language processing.

ZHU Xuan received the B.S. degree in electronic engineering from Beijing University of Astronautics and Aeronautics in 1998 and M.S. degree in electronic engineering from Tsinghua University in 2001. She is a Ph.D. candidate at the Department of Electronic Engineering of Tsinghua University. Her research interests are speech recognition and embedded signal processing system design.

LIU Jia received the B.S., M.S. and Ph.D. degrees in electronic engineering from Tsinghua University from 1983, 1986 and 1990 respectively, and was a post doctor in Cambridge University from 1992 to 1994. He is a professor at the Department of Electronic Engineering of Tsinghua University. His research interests are speech recognition, speech coding, speech synthesis and speech. ASIC design. Dr. Liu is a member of IEEE and a senior member of China Institute of Electronics.

LIU RunSheng received the B.S. degree in electronic engineering from Tsinghua University in 1958. He is professor at the Department of Electronic Engineering of Tsinghua University. His research interests are speech recognition, IC design and CAD.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Zhu, X., Liu, J. et al. Towards robustness to speech rate in mandarin all-syllable recognition. J. Comput. Sci. & Technol. 18, 756–761 (2003). https://doi.org/10.1007/BF02945464

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02945464

Keywords