Abstract
This paper presents an integrated system aimed at synthesizing the facial animation from speech information. A network IFNET composed of context-dependent HMMs(Hidden Markov Model) representing Chinese sub-syllables is employed to obtain the corresponding Chinese initial and final sequence within the input speech. Instead of being based on some finite audio-visual database, IFNET is just built according to the Chinese mandarin pronunciation rules. Considering the large amount of computation, we embed Forward-Backward Search Algorithm in the course of searching in IFNET. After the initial and final sequence constructed, they are converted to the MPEG-4 high-level facial animation parameters to drive a 3D head model performing corresponding facial expressions. Experiment results show that our system works well in simulating the real mouth shapes, giving the speech information in many different situations speaking Chinese.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Massaro, D.W.: Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry. Lawrence Erlbaum, Hillsdale (1987)
Pandzic, I., Ostermann, J., Millen, D.: User evaluation: Synthetic talking faces for interactive services. Visual Comput. 15(7/8), 330–340 (1999)
Cohen, M., Massaro, D.: Modeling coarticulation in synthetic visual speech. In: Models and Techniques in Computer Animation, pp. 141–155. Springer, Tokyo (1993)
Brand, M.: Voice Puppetry. In: Proceedings of SIGGRAPH 1999, pp. 21–28 (1999)
Lavagetto, F.: Converting speech into lip movements: A multimedia telephone for hard of hearing people. IEEE Transactions on Rehabilitation Engineering 3(1), 90–102 (1995)
Chen, Y.-q.: Data Mining and Speech Driven Face Animation. Journal of System Simulation 14(4), 496–500 (2002)
Wang, Z., Cai, L.: Study of the Relationship Between Chinese Speech and Mouth Shape. Journal of System Simulation 14(4) (2002)
Yamamoto, E., Nakamura, S., Shikano, K.: Lip movement synthesis from speech based on hidden Markov models. In: Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 154–159 (1998)
Waters, K., Levergood, T.: DECface:A system for synthetic face applications. Multimedia Tools and Applications 1(4), 349–366 (1995)
Lin, I.-C., Hung, C.-S., Yang, T.-J., Ouhyoung, M.: A Speech Driven Talking Head System Based on a Single Face Image. In: Proc. Pacific Conference on Computer Graphics and Applications 1999 (PG 1999), pp. 43–49 (1999)
Kshirsagar, S., Magnenat-Thalmann, N.: Visyllable Based Speech Animation. EUROGRAPHICS 2003 22(3) (2003)
Bregler, C., Covell, M., Slaney, M.: Video Rewrite: Driving visual speech with audio. In: Proceedings of the 1997 Conference on Computer Graphics, SIGGRAPH, pp. 353–360 (1997)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal, pp. 357–366 (1980)
Itoh, T., Takeda, K., Itakura, F.: Acoustic analysis and recognition of whispered speech. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal, pp. I/389–I/392 (2002)
Kasper, K., Reininger, H., Wolf, D., Wuest, H.: Fully recurrent neural network for recognition of noisy telephone speech. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal, pp. 3331–3334 (1995)
Rao, R., Chen, T.: Using HMM’s for Audio-to-Visual Conversion. In: IEEE Workshop on Multimedia Signal Processing (1997)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Boll, S.: Suppression of acoustic noise in speech using spectral subtraction. In: Proceedings of Acoustics, Speech, and Signal Processing, vol. 27(2), pp. 113–120 (1979)
Austin, S., Schwartz, R., Placeway, P.: The forward-backward search algorithm. In: Proceedings of Acoustics, Speech, and Signal Processing, vol. 1, pp. 697–700 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
You, M., Bu, J., Chen, C., Song, M. (2004). Speech Driven Facial Animation Using Chinese Mandarin Pronunciation Rules. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds) Computational Science and Its Applications – ICCSA 2004. ICCSA 2004. Lecture Notes in Computer Science, vol 3045. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24767-8_93
Download citation
DOI: https://doi.org/10.1007/978-3-540-24767-8_93
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22057-2
Online ISBN: 978-3-540-24767-8
eBook Packages: Springer Book Archive