Abstract
This paper proposes a novel approach towards a video- realistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio-visual facial recordings for a base language (English). Given a Cantonese speech input, we first use a Cantonese speech recognizer to generate a Cantonese syllable transcription. Then we map it to an English phoneme transcription via a translingual mapping scheme that involves symbol mapping and time alignment from Cantonese syllables to English phonemes. With the phoneme transcription, the input speech, and the audio-visual models for English, an EM-based conversion algorithm is adopted to generate mouth animation parameters associated with the input Cantonese audio. We have carried out audio-visual syllable recognition experiments to objectively evaluate the proposed talking face. Results show that the visual speech synthesized by the Cantonese talking face can effectively increase the accuracy of Cantonese syllable recognition under noisy acoustic conditions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ostermann, J., Weissenfeld, A.: Talking Faces–Technologies and Applications. In: Proc. 17th ICPR (2004)
Pighin, F., Hecker, D., Lischinski, R., Szeliski, D.H.: Synthesizing Realistic Facial Expressions from Photographs. Siggraph, 75–84 (1998)
Cosatto, E., Ostermann, J.: Lifelike Talking Faces for Interactive Services. Proceedings of IEEE 91(9), 1406–1429 (2003)
Olives, J.-L., Sams, M., Kulju, J., Seppaia, O., Karjalainen, M., Altosaar, T., Lemmetty, S., Toyra, K., Vainio, M.: Towards a High Quality Finnish Talking Head. In: IEEE 3rd Workshop on Multimedia Signal Processing, pp. 433–437 (1999)
Pelachaud, C.E., Magno-Caldognetto, Z.C., Cosi, P.: Modelling an Italian Talking Head. In: Proc. Audio-Visual Speech Processing, pp. 72–77 (2001)
Wang, J.-Q., Wong, K.-H., Heng, P.-A., Meng, H., Wong, T.-T.: A Real-Time Cantonese Text-To-Audiovisual Speech Synthesizer. In: Proc. ICASSP, pp. 653–656 (2004)
Verma, A., Subramaniam, V., Rajput, N., Neti, C.: Animating Expressive Faces Across Languages. IEEE Trans. on Multimedia 6(6), 791–800 (2003)
Xie, L., Liu, Z.-Q.: An Articulatory Appraoch to Video-Realistic Mouth Animation. In: Proc. of ICASSP, pp. 593–596 (2006)
Young, S., Evermann, G., Kershaw, D., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.2), Cambridge University Engineering Department (2002), http://htk.eng.cam.ac.uk/
Linguistic Society of Hong Kong. Cantonese Transcription Scheme (1997)
Hui, P.Y., Lo, W.K., Meng, H.: Tow Robust Methods for Cantonese Spoken Document Retrieval. In: Proc. of 2003 ISCA Workshop on Multilingual Spoken Document Retrieval, pp. 7–12 (2003)
Xie, L., Liu, Z.-Q.: A Coupled HMM Approach to Video-Realisic Speech Animation. Pattern Recognition (submitted)(2006)
Cosatto, E.: Sample-Based Talking-Head Synthesis. Ph.D Thesis of Swiss Federal Institue of Technology (2002)
Pérez, P., Gangnet, M., Blake, A.: Poisson Image Editing. Siggraph, 313–318 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xie, L., Meng, H., Liu, ZQ. (2006). A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_64
Download citation
DOI: https://doi.org/10.1007/11939993_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)