Abstract
In mandarin all-syllable recognition, many insert errors occur due to the influence of non-consonant syllables. Introducing the duration model into the recognition process is a direct way to lessen these errors. But that usually could not work well as expected, for the duration is sensitive to speech rate. Hence, aiming at this problem, a novel context dependent duration distribution normalized by speech rate is proposed in this paper and applied to a speech recognition system based on the frame of improved Hidden Markov Model (HMM). To realize this algorithm, the authors employ a new method to estimate the speech rate of a sentence; then compute the duration probability combined with speech rate; and finally implement this duration information in the post-processing stage. With little change in the recognition process and resource demand, the duration model is adopted efficiently in the system. The experimental results indicate that the syllable error rates decrease significantly in two different speech corpora. Especially for the insertions, the error rates reduce about sixty to eighty percent.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
David W Carroll. Psychology of Language Third Edition, Brooks/Cole Publishing Company, 1999.
Zheng J, Franco H, Stolcke A. Rate-of-speech modeling for large vocabulary conversational speech recognition. InProc. the ISCA ITRW ASR2000, Paris, France, 2000, pp. 145–149.
Martinez F, Tapias D, Alvarez J. Towards speech rate independence, in large vocabulary continuous speech recognition. InProc. ICASSP, vol. 2, New York, NY, USA, May 12–15, 1998, pp. 725–728.
Kwon O W, Un C K. Context dependent word duration modeling for Korean connected digit recognition.Electronic Letters, 1995, 31(19): 1630–1631.
Steve Young. Statistical modeling in continuous speech recognition. InProc. Int. Conf. Uncertainty in Artificial Intelligence, Seattle, WA, Aug. 2001, pp. 562–571.
Liu Jia, Pan S X. A new robust telephone speech recognition algorithm with the multi-model structures.Chinese Journal of Electronics, Apr. 2000, 9(2): 169–174.
Wang R H. National performance assessment of speech recognition systems of Chinese. InProc. Oriental COCOSDA Workshop’99, Taipei, 1999, pp. 41–44.
Woodland P C, Gales M J F, Pye Det al. Broadcast news transcription using HTK. InProc. ICASSP’97, Los Alamitos, CA, USA, 1997, pp. 719–722.
Eric Chang, Yu Shi, Jianlai, Zhouet al. Speech lab in a box: A mandarin speech toolbox to jumpstart speech related research. InProc. Eurospeech Aalborg, Denmark, 2001, pp. 2799–2802.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research is supported by the National Natural Science Foundation of China (Grant No. 60272016).
CHEN YiNing received the B.S. and M.S. degrees in electronic engineering from Tsinghua University in 1999 and 2001 respectively. He is a Ph.D. candidate at the Department of Electronic Engineering of Tsinghua University. His research interests are speech recognition and spoken language processing.
ZHU Xuan received the B.S. degree in electronic engineering from Beijing University of Astronautics and Aeronautics in 1998 and M.S. degree in electronic engineering from Tsinghua University in 2001. She is a Ph.D. candidate at the Department of Electronic Engineering of Tsinghua University. Her research interests are speech recognition and embedded signal processing system design.
LIU Jia received the B.S., M.S. and Ph.D. degrees in electronic engineering from Tsinghua University from 1983, 1986 and 1990 respectively, and was a post doctor in Cambridge University from 1992 to 1994. He is a professor at the Department of Electronic Engineering of Tsinghua University. His research interests are speech recognition, speech coding, speech synthesis and speech. ASIC design. Dr. Liu is a member of IEEE and a senior member of China Institute of Electronics.
LIU RunSheng received the B.S. degree in electronic engineering from Tsinghua University in 1958. He is professor at the Department of Electronic Engineering of Tsinghua University. His research interests are speech recognition, IC design and CAD.
Rights and permissions
About this article
Cite this article
Chen, Y., Zhu, X., Liu, J. et al. Towards robustness to speech rate in mandarin all-syllable recognition. J. Comput. Sci. & Technol. 18, 756–761 (2003). https://doi.org/10.1007/BF02945464
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02945464