A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion

Xie, Lei; Meng, Helen; Liu, Zhi-Qiang

doi:10.1007/11939993_64

Lei Xie²²,
Helen Meng²² &
Zhi-Qiang Liu²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

International Symposium on Chinese Spoken Language Processing

1599 Accesses

Abstract

This paper proposes a novel approach towards a video- realistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio-visual facial recordings for a base language (English). Given a Cantonese speech input, we first use a Cantonese speech recognizer to generate a Cantonese syllable transcription. Then we map it to an English phoneme transcription via a translingual mapping scheme that involves symbol mapping and time alignment from Cantonese syllables to English phonemes. With the phoneme transcription, the input speech, and the audio-visual models for English, an EM-based conversion algorithm is adopted to generate mouth animation parameters associated with the input Cantonese audio. We have carried out audio-visual syllable recognition experiments to objectively evaluate the proposed talking face. Results show that the visual speech synthesized by the Cantonese talking face can effectively increase the accuracy of Cantonese syllable recognition under noisy acoustic conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Analysis of Facial Motion Capture Data for Visual Speech Synthesis

Speech synthesis with face embeddings

Article 18 March 2022

3D facial animation driven by speech-video dual-modal signals

Article Open access 23 May 2024

References

Ostermann, J., Weissenfeld, A.: Talking Faces–Technologies and Applications. In: Proc. 17th ICPR (2004)
Google Scholar
Pighin, F., Hecker, D., Lischinski, R., Szeliski, D.H.: Synthesizing Realistic Facial Expressions from Photographs. Siggraph, 75–84 (1998)
Google Scholar
Cosatto, E., Ostermann, J.: Lifelike Talking Faces for Interactive Services. Proceedings of IEEE 91(9), 1406–1429 (2003)
Article Google Scholar
Olives, J.-L., Sams, M., Kulju, J., Seppaia, O., Karjalainen, M., Altosaar, T., Lemmetty, S., Toyra, K., Vainio, M.: Towards a High Quality Finnish Talking Head. In: IEEE 3rd Workshop on Multimedia Signal Processing, pp. 433–437 (1999)
Google Scholar
Pelachaud, C.E., Magno-Caldognetto, Z.C., Cosi, P.: Modelling an Italian Talking Head. In: Proc. Audio-Visual Speech Processing, pp. 72–77 (2001)
Google Scholar
Wang, J.-Q., Wong, K.-H., Heng, P.-A., Meng, H., Wong, T.-T.: A Real-Time Cantonese Text-To-Audiovisual Speech Synthesizer. In: Proc. ICASSP, pp. 653–656 (2004)
Google Scholar
Verma, A., Subramaniam, V., Rajput, N., Neti, C.: Animating Expressive Faces Across Languages. IEEE Trans. on Multimedia 6(6), 791–800 (2003)
Article Google Scholar
Xie, L., Liu, Z.-Q.: An Articulatory Appraoch to Video-Realistic Mouth Animation. In: Proc. of ICASSP, pp. 593–596 (2006)
Google Scholar
Young, S., Evermann, G., Kershaw, D., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.2), Cambridge University Engineering Department (2002), http://htk.eng.cam.ac.uk/
Linguistic Society of Hong Kong. Cantonese Transcription Scheme (1997)
Google Scholar
Hui, P.Y., Lo, W.K., Meng, H.: Tow Robust Methods for Cantonese Spoken Document Retrieval. In: Proc. of 2003 ISCA Workshop on Multilingual Spoken Document Retrieval, pp. 7–12 (2003)
Google Scholar
Xie, L., Liu, Z.-Q.: A Coupled HMM Approach to Video-Realisic Speech Animation. Pattern Recognition (submitted)(2006)
Google Scholar
Cosatto, E.: Sample-Based Talking-Head Synthesis. Ph.D Thesis of Swiss Federal Institue of Technology (2002)
Google Scholar
Pérez, P., Gangnet, M., Blake, A.: Poisson Image Editing. Siggraph, 313–318 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Human-Computer Communications Laboratory, Dept. of Systems Engineering & Engineering Management, The Chinese University of Hong Kong, Hong Kong
Lei Xie & Helen Meng
School of Creative Media, City University of Hong Kong, Hong Kong
Zhi-Qiang Liu

Authors

Lei Xie
View author publications
You can also search for this author in PubMed Google Scholar
Helen Meng
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Qiang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Hong Kong, Hong Kong
Qiang Huo
Human Language Technology Department, Institute for Infocomm Research (I2R), 119613, Singapore
Bin Ma
School of Computer Engineering, Nanyang Technological University (NTU), 639798, Singapore
Eng-Siong Chng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Haizhou Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, L., Meng, H., Liu, ZQ. (2006). A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_64

Download citation

DOI: https://doi.org/10.1007/11939993_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Analysis of Facial Motion Capture Data for Visual Speech Synthesis

Speech synthesis with face embeddings

3D facial animation driven by speech-video dual-modal signals

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Analysis of Facial Motion Capture Data for Visual Speech Synthesis

Speech synthesis with face embeddings

3D facial animation driven by speech-video dual-modal signals

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation