Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Speech Driven Facial Animation Using Chinese Mandarin Pronunciation Rules

  • Conference paper
Computational Science and Its Applications – ICCSA 2004 (ICCSA 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3045))

Included in the following conference series:

  • 932 Accesses

Abstract

This paper presents an integrated system aimed at synthesizing the facial animation from speech information. A network IFNET composed of context-dependent HMMs(Hidden Markov Model) representing Chinese sub-syllables is employed to obtain the corresponding Chinese initial and final sequence within the input speech. Instead of being based on some finite audio-visual database, IFNET is just built according to the Chinese mandarin pronunciation rules. Considering the large amount of computation, we embed Forward-Backward Search Algorithm in the course of searching in IFNET. After the initial and final sequence constructed, they are converted to the MPEG-4 high-level facial animation parameters to drive a 3D head model performing corresponding facial expressions. Experiment results show that our system works well in simulating the real mouth shapes, giving the speech information in many different situations speaking Chinese.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Massaro, D.W.: Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry. Lawrence Erlbaum, Hillsdale (1987)

    Google Scholar 

  2. Pandzic, I., Ostermann, J., Millen, D.: User evaluation: Synthetic talking faces for interactive services. Visual Comput. 15(7/8), 330–340 (1999)

    Article  Google Scholar 

  3. Cohen, M., Massaro, D.: Modeling coarticulation in synthetic visual speech. In: Models and Techniques in Computer Animation, pp. 141–155. Springer, Tokyo (1993)

    Google Scholar 

  4. Brand, M.: Voice Puppetry. In: Proceedings of SIGGRAPH 1999, pp. 21–28 (1999)

    Google Scholar 

  5. Lavagetto, F.: Converting speech into lip movements: A multimedia telephone for hard of hearing people. IEEE Transactions on Rehabilitation Engineering 3(1), 90–102 (1995)

    Article  Google Scholar 

  6. Chen, Y.-q.: Data Mining and Speech Driven Face Animation. Journal of System Simulation 14(4), 496–500 (2002)

    Google Scholar 

  7. Wang, Z., Cai, L.: Study of the Relationship Between Chinese Speech and Mouth Shape. Journal of System Simulation 14(4) (2002)

    Google Scholar 

  8. Yamamoto, E., Nakamura, S., Shikano, K.: Lip movement synthesis from speech based on hidden Markov models. In: Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 154–159 (1998)

    Google Scholar 

  9. Waters, K., Levergood, T.: DECface:A system for synthetic face applications. Multimedia Tools and Applications 1(4), 349–366 (1995)

    Article  Google Scholar 

  10. Lin, I.-C., Hung, C.-S., Yang, T.-J., Ouhyoung, M.: A Speech Driven Talking Head System Based on a Single Face Image. In: Proc. Pacific Conference on Computer Graphics and Applications 1999 (PG 1999), pp. 43–49 (1999)

    Google Scholar 

  11. Kshirsagar, S., Magnenat-Thalmann, N.: Visyllable Based Speech Animation. EUROGRAPHICS 2003 22(3) (2003)

    Google Scholar 

  12. Bregler, C., Covell, M., Slaney, M.: Video Rewrite: Driving visual speech with audio. In: Proceedings of the 1997 Conference on Computer Graphics, SIGGRAPH, pp. 353–360 (1997)

    Google Scholar 

  13. Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal, pp. 357–366 (1980)

    Google Scholar 

  14. Itoh, T., Takeda, K., Itakura, F.: Acoustic analysis and recognition of whispered speech. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal, pp. I/389–I/392 (2002)

    Google Scholar 

  15. Kasper, K., Reininger, H., Wolf, D., Wuest, H.: Fully recurrent neural network for recognition of noisy telephone speech. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal, pp. 3331–3334 (1995)

    Google Scholar 

  16. Rao, R., Chen, T.: Using HMM’s for Audio-to-Visual Conversion. In: IEEE Workshop on Multimedia Signal Processing (1997)

    Google Scholar 

  17. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  18. Boll, S.: Suppression of acoustic noise in speech using spectral subtraction. In: Proceedings of Acoustics, Speech, and Signal Processing, vol. 27(2), pp. 113–120 (1979)

    Google Scholar 

  19. Austin, S., Schwartz, R., Placeway, P.: The forward-backward search algorithm. In: Proceedings of Acoustics, Speech, and Signal Processing, vol. 1, pp. 697–700 (1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

You, M., Bu, J., Chen, C., Song, M. (2004). Speech Driven Facial Animation Using Chinese Mandarin Pronunciation Rules. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds) Computational Science and Its Applications – ICCSA 2004. ICCSA 2004. Lecture Notes in Computer Science, vol 3045. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24767-8_93

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24767-8_93

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22057-2

  • Online ISBN: 978-3-540-24767-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics