Speech Driven Facial Animation Using Chinese Mandarin Pronunciation Rules

You, Mingyu; Bu, Jiajun; Chen, Chun; Song, Mingli

doi:10.1007/978-3-540-24767-8_93

Mingyu You²⁰,
Jiajun Bu²⁰,
Chun Chen²⁰ &
…
Mingli Song²⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3045))

Included in the following conference series:

International Conference on Computational Science and Its Applications

932 Accesses

Abstract

This paper presents an integrated system aimed at synthesizing the facial animation from speech information. A network IFNET composed of context-dependent HMMs(Hidden Markov Model) representing Chinese sub-syllables is employed to obtain the corresponding Chinese initial and final sequence within the input speech. Instead of being based on some finite audio-visual database, IFNET is just built according to the Chinese mandarin pronunciation rules. Considering the large amount of computation, we embed Forward-Backward Search Algorithm in the course of searching in IFNET. After the initial and final sequence constructed, they are converted to the MPEG-4 high-level facial animation parameters to drive a 3D head model performing corresponding facial expressions. Experiment results show that our system works well in simulating the real mouth shapes, giving the speech information in many different situations speaking Chinese.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Modular Joint Training for Speech-Driven 3D Facial Animation

3D facial animation driven by speech-video dual-modal signals

Article Open access 23 May 2024

Synthesis of Photo-Realistic Facial Animation from Text Based on HMM and DNN with Animation Unit

References

Massaro, D.W.: Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry. Lawrence Erlbaum, Hillsdale (1987)
Google Scholar
Pandzic, I., Ostermann, J., Millen, D.: User evaluation: Synthetic talking faces for interactive services. Visual Comput. 15(7/8), 330–340 (1999)
Article Google Scholar
Cohen, M., Massaro, D.: Modeling coarticulation in synthetic visual speech. In: Models and Techniques in Computer Animation, pp. 141–155. Springer, Tokyo (1993)
Google Scholar
Brand, M.: Voice Puppetry. In: Proceedings of SIGGRAPH 1999, pp. 21–28 (1999)
Google Scholar
Lavagetto, F.: Converting speech into lip movements: A multimedia telephone for hard of hearing people. IEEE Transactions on Rehabilitation Engineering 3(1), 90–102 (1995)
Article Google Scholar
Chen, Y.-q.: Data Mining and Speech Driven Face Animation. Journal of System Simulation 14(4), 496–500 (2002)
Google Scholar
Wang, Z., Cai, L.: Study of the Relationship Between Chinese Speech and Mouth Shape. Journal of System Simulation 14(4) (2002)
Google Scholar
Yamamoto, E., Nakamura, S., Shikano, K.: Lip movement synthesis from speech based on hidden Markov models. In: Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 154–159 (1998)
Google Scholar
Waters, K., Levergood, T.: DECface:A system for synthetic face applications. Multimedia Tools and Applications 1(4), 349–366 (1995)
Article Google Scholar
Lin, I.-C., Hung, C.-S., Yang, T.-J., Ouhyoung, M.: A Speech Driven Talking Head System Based on a Single Face Image. In: Proc. Pacific Conference on Computer Graphics and Applications 1999 (PG 1999), pp. 43–49 (1999)
Google Scholar
Kshirsagar, S., Magnenat-Thalmann, N.: Visyllable Based Speech Animation. EUROGRAPHICS 2003 22(3) (2003)
Google Scholar
Bregler, C., Covell, M., Slaney, M.: Video Rewrite: Driving visual speech with audio. In: Proceedings of the 1997 Conference on Computer Graphics, SIGGRAPH, pp. 353–360 (1997)
Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal, pp. 357–366 (1980)
Google Scholar
Itoh, T., Takeda, K., Itakura, F.: Acoustic analysis and recognition of whispered speech. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal, pp. I/389–I/392 (2002)
Google Scholar
Kasper, K., Reininger, H., Wolf, D., Wuest, H.: Fully recurrent neural network for recognition of noisy telephone speech. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal, pp. 3331–3334 (1995)
Google Scholar
Rao, R., Chen, T.: Using HMM’s for Audio-to-Visual Conversion. In: IEEE Workshop on Multimedia Signal Processing (1997)
Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Boll, S.: Suppression of acoustic noise in speech using spectral subtraction. In: Proceedings of Acoustics, Speech, and Signal Processing, vol. 27(2), pp. 113–120 (1979)
Google Scholar
Austin, S., Schwartz, R., Placeway, P.: The forward-backward search algorithm. In: Proceedings of Acoustics, Speech, and Signal Processing, vol. 1, pp. 697–700 (1991)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science, ZheJiang University, China
Mingyu You, Jiajun Bu, Chun Chen & Mingli Song

Authors

Mingyu You
View author publications
You can also search for this author in PubMed Google Scholar
Jiajun Bu
View author publications
You can also search for this author in PubMed Google Scholar
Chun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mingli Song
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Chemistry, University of Perugia, Via Elce di Sotto, 8, I-06123, Perugia, Italy
Antonio Laganá
Department of Computer Science, University of Calgary, 2500 University Drive N.W., T2N 1N4, Calgary, AB, Canada
Marina L. Gavrilova
William Norris Professor, Head of the Computer Science and Engineering Department, University of Minnesota, USA
Vipin Kumar
School of Computing, Soongsil University, Seoul, Korea
Youngsong Mun
OptimaNumerics Ltd., Cathedral House, 23-31 Waring Street, BT1 2DX, Belfast, UK
C. J. Kenneth Tan
Department of Mathematics and Computer Science, University of Perugia, via Vanvitelli, 1, I-06123, Perugia, Italy
Osvaldo Gervasi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

You, M., Bu, J., Chen, C., Song, M. (2004). Speech Driven Facial Animation Using Chinese Mandarin Pronunciation Rules. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds) Computational Science and Its Applications – ICCSA 2004. ICCSA 2004. Lecture Notes in Computer Science, vol 3045. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24767-8_93

Download citation

DOI: https://doi.org/10.1007/978-3-540-24767-8_93
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22057-2
Online ISBN: 978-3-540-24767-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Speech Driven Facial Animation Using Chinese Mandarin Pronunciation Rules

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Modular Joint Training for Speech-Driven 3D Facial Animation

3D facial animation driven by speech-video dual-modal signals

Synthesis of Photo-Realistic Facial Animation from Text Based on HMM and DNN with Animation Unit

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Speech Driven Facial Animation Using Chinese Mandarin Pronunciation Rules

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Modular Joint Training for Speech-Driven 3D Facial Animation

3D facial animation driven by speech-video dual-modal signals

Synthesis of Photo-Realistic Facial Animation from Text Based on HMM and DNN with Animation Unit

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation