Visual Speech Synthesis by Morphing Visemes

Visual Speech Synthesis by Morphing VisemesDecember 1999

December 1999

1999 Technical Report

Publisher:

Massachusetts Institute of Technology
201 Vassar Street, W59-200 Cambridge, MA
United States

Published:01 December 1999

Bibliometrics

Abstract

We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.

Cited By

Contributors

Tony Farid Ezzat
Center for Biological & Computational Learning
- Publication Years1996 - 2005
- Publication counts10
- Citation count346
- Available for Download3
- Downloads (cumulative)3,994
- Downloads (12 months)31
- Downloads (6 weeks)5
- Average Downloads per Article1,331
- Average Citation per Article35
View Full Profile
Tomaso A Poggio
Massachusetts Institute of Technology
- Publication Years1984 - 2024
- Publication counts165
- Citation count3,393
- Available for Download11
- Downloads (cumulative)49,894
- Downloads (12 months)367
- Downloads (6 weeks)56
- Average Downloads per Article4,536
- Average Citation per Article21
View Full Profile

Comments

Recommendations

Visual Speech Synthesis by Morphing Visemes
special issue on learning and vision at the center for biological and computational learning, Massachusetts Institute of Technology

We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired ...
Speech Enhancement Using Speech Synthesis Techniques
Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...

Browse Reports

Sections

Cited By