Article

Real-time speech motion synthesis from recorded motions

Authors:

Petros Faloutsos,

Frédéric PighinAuthors Info & Claims

SCA '04: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation

Pages 345 - 353

https://doi.org/10.1145/1028523.1028570

Published: 27 August 2004 Publication History

Abstract

Data-driven approaches have been successfully used for realistic visual speech synthesis. However, little effort has been devoted to real-time lip-synching for interactive applications. In particular, algorithms that are based on a graph of motions are notorious for their exponential complexity. In this paper, we present a greedy graph search algorithm that yields vastly superior performance and allows real-time motion synthesis from a large database of motions. The time complexity of the algorithm is linear with respect to the size of an input utterance. In our experiments, the synthesis time for an input sentence of average length is under a second.

Supplementary Material

JPG File (p345-cao.jpg)

Download
16.25 KB

MPEG File (p345-cao.mpeg)

Supplemental video

Download
40.91 MB

References

[1]

{AF02} Arikan O., Forsyth D. A.: Interactive motion generation from examples. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques (2002), ACM Press, pp. 483--490.

Digital Library

[2]

{BCS97} Bregler C., Covell M., Slaney M.: Video rewrite: driving visual speech with audio. In SIGGRAPH 97 Conference Proceedings (Aug. 1997), ACM SIGGRAPH, pp. 353--360.

Digital Library

[3]

{Bra99} Brand M.: Voice puppetry. In Proceedings of ACM SIGGRAPH 1999 (1999), ACM Press/Addison-Wesley Publishing Co., pp. 21--28.

Digital Library

[4]

{BS94} Brook N., Scott S.: Computer graphics animations of talking faces based on stochastic models. In International Symposium on Speech, Image Processing, and Neural Networkds (1994).

[5]

{Buh03} Buhmann M. D.: Radial Basis Functions: Theory and Implementations. Cambridge University Press, 2003.

Digital Library

[6]

{CM93} Cohen N., Massaro D. W.: Modeling coarticulation in synthetic visual speech. In Models and Techniques in Computer Animation (1993), Thalmann N. M., Thalmann D., (Eds.), Springer-Verlang, pp. 139--156.

[7]

{CPB*94} Cassell J., Pelachaud C., Badler N., Steedman M., Achorn B., Becket W., Douville B., Prevost S., Stone M.: Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In Proceedings of ACM SIGGRAPH 1994 (1994).

Digital Library

[8]

{CXH03} Chai J., Xiao J., Hodgins J.: Vision-based control of 3d facial animation. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2003), Eurographics Association, pp. 193--206.

Digital Library

[9]

{EGP02} Ezzat T., Geiger G., Poggio T.: Trainable videorealistic speech animation. In Proceedings of ACM SIGGRPAH 2002 (2002), ACM Press, pp. 388--398.

Digital Library

[10]

{Int} International Computer Science Institute, Berkeley, CA: Rasta software. www.icsi.berkeley.edu/Speech/rasta.html.

[11]

{KGP02} Kovar L., Gleicher M., Pighin F.: Motion graphs. In Proceedings of ACM SIGGRAPH 2002 (2002), ACM Press, pp. 473--482.

Digital Library

[12]

{KMG02} Kalberer G. A., Mueller P., Gool L. V.: Speech animation using viseme space. In Vision, Modeling, and Visualization VMV 2002 (2002), Akademische Verlags-gesellschaft Aka GmbH, Berlin, pp. 463--470.

[13]

{LCR*02} Lee J., Chai J., Reitsma P., Hodgins J., Pollard N.: Interactive control of avatars animated with human motion data, 2002.

[14]

{LTW95} Lee Y., Terzopoulos D., Waters K.: Realistic modeling for facial animation. In SIGGRAPH 95 Conference Proceedings (Aug. 1995), ACM SIGGRAPH, pp. 55--62.

Digital Library

[15]

{LWS02} Li Y., Wang T., Shum H.-Y.: Motion texture: A two-level statistical model for character motion synthesis. ACM Transactions on Graphics 21, 3 (July 2002), 465--472.

Digital Library

[16]

{MKT*98} Masuko T., Kobayashi T., Tamura M., Masubuchi J., K. Tokuda: Text-to-visual speech synthesis based on parameter generation from hmm. In ICASSP (1998).

[17]

{Pel91} Pelachaud C.: Realistic Face Animation for Speech. PhD thesis, University of Pennsylvania, 1991.

[18]

{SBCS04} Saisan P., Bissacco A., Chiuso A., Soatto S.: Modeling and synthesis of facial motion driven by speech. In European Conference on Computer Vision 2004 (2004), pp. 456--467.

[19]

{SG} Speech Group C. M. U.:. www.speech.cs.cmu.edu/festival.

[20]

{Wat87} Waters K.: A muscle model for animating three-dimensional facial expression. In SIGGRAPH 87 Conference Proceedings) (July 1987), vol. 21, ACM SIGGRAPH, pp. 17--24.

Digital Library

Cited By

Kim JHwang DSeo K(2022)Avatar content creation system using scriptJournal of Digital Contents Society10.9728/dcs.2022.23.1.1123:1(11-19)Online publication date: 31-Jan-2022
https://doi.org/10.9728/dcs.2022.23.1.11
Krejsa JLiarokapis F(2021)A Novel Lip Synchronization Approach for Games and Virtual Environments2021 IEEE Conference on Games (CoG)10.1109/CoG52621.2021.9619128(1-9)Online publication date: 17-Aug-2021
https://dl.acm.org/doi/10.1109/CoG52621.2021.9619128
Zollhöfer MThies JGarrido PBradley DBeeler TPérez PStamminger MNießner MTheobalt C(2018)State of the Art on Monocular 3D Face Reconstruction, Tracking, and ApplicationsComputer Graphics Forum10.1111/cgf.1338237:2(523-550)Online publication date: 22-May-2018
https://doi.org/10.1111/cgf.13382
Show More Cited By

Index Terms

Real-time speech motion synthesis from recorded motions
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        3D imaging
  2. Computer graphics
    1. Animation
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Video-guided motion synthesis using example motions

Video taken from a single monocular camera is the most common means of recording human motion. In this article, we present a practical, semiautomatic method for synthesizing a human motion that is guided by such video. After preprocessing an input video,...
Keyframe-Editable Real-Time Motion Synthesis
Since existing motion synthesis methods often lack precise controls to the synthesis process, we propose a keyframe-editable motion synthesis framework which allows users to edit the keyframes of an expected motion sequence and use the edited keyframes to ...
Speech synthesis in telecommunications

A text-to-speech synthesis system that synthesizes speech from unrestricted text is discussed. The text analysis system, which includes text preprocessing, phrasing and intonation, and letter-to-phoneme conversion, is described. The analyzed text is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SCA '04: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation

August 2004

388 pages

ISBN:3905673142

Conference Chairs:
Norman Badler
University of Pennsylvania
,
Mathieu Desbrun
University of Southern California
,
Ronan Boulic,
Dinesh Pai

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
EUROGRAPHICS: The European Association for Computer Graphics

Publisher

Eurographics Association

Goslar, Germany

Publication History

Published: 27 August 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SCA04

Sponsor:

SIGGRAPH
EUROGRAPHICS

SCA04: Symposium on Computer Animation 2004

August 27 - 29, 2004

Grenoble, France

Acceptance Rates

Overall Acceptance Rate 183 of 487 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
834
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kim JHwang DSeo K(2022)Avatar content creation system using scriptJournal of Digital Contents Society10.9728/dcs.2022.23.1.1123:1(11-19)Online publication date: 31-Jan-2022
https://doi.org/10.9728/dcs.2022.23.1.11
Krejsa JLiarokapis F(2021)A Novel Lip Synchronization Approach for Games and Virtual Environments2021 IEEE Conference on Games (CoG)10.1109/CoG52621.2021.9619128(1-9)Online publication date: 17-Aug-2021
https://dl.acm.org/doi/10.1109/CoG52621.2021.9619128
Zollhöfer MThies JGarrido PBradley DBeeler TPérez PStamminger MNießner MTheobalt C(2018)State of the Art on Monocular 3D Face Reconstruction, Tracking, and ApplicationsComputer Graphics Forum10.1111/cgf.1338237:2(523-550)Online publication date: 22-May-2018
https://doi.org/10.1111/cgf.13382
Garrido PValgaerts LSarmadi HSteiner IVaranasi KPérez PTheobalt C(2015)VDubComputer Graphics Forum10.1111/cgf.1255234:2(193-204)Online publication date: 1-May-2015
https://dl.acm.org/doi/10.1111/cgf.12552
Mattheyses WVerhelst W(2015)Audiovisual speech synthesisSpeech Communication10.1016/j.specom.2014.11.00166:C(182-217)Online publication date: 1-Feb-2015
https://dl.acm.org/doi/10.1016/j.specom.2014.11.001
Bermano ABradley DBeeler TZund FNowrouzezahrai DBaran ISorkine-Hornung OPfister HSumner RBickel BGross M(2014)Facial performance enhancement using dynamic shape space analysisACM Transactions on Graphics10.1145/254627633:2(1-12)Online publication date: 8-Apr-2014
https://dl.acm.org/doi/10.1145/2546276
Liu SHuang DLin WDong MLi HOng E(2014)Emotional facial expression transfer based on temporal restricted Boltzmann machinesSignal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific10.1109/APSIPA.2014.7041738(1-7)Online publication date: Dec-2014
https://doi.org/10.1109/APSIPA.2014.7041738
Xu YFeng AMarsella SShapiro AMcDonnell RSturtevant NZordan V(2013)A Practical and Configurable Lip Sync Method for GamesProceedings of Motion on Games10.1145/2522628.2522904(131-140)Online publication date: 6-Nov-2013
https://dl.acm.org/doi/10.1145/2522628.2522904
Deena SHou SGalata A(2013)Visual Speech Synthesis Using a Variable-Order Switching Shared Gaussian Process Dynamical ModelIEEE Transactions on Multimedia10.1109/TMM.2013.227965915:8(1755-1768)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1109/TMM.2013.2279659
Taylor SMahler MTheobald BMatthews IBoulic RKomura T(2012)Dynamic units of visual speechProceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation10.5555/2422356.2422395(275-284)Online publication date: 29-Jul-2012
https://dl.acm.org/doi/10.5555/2422356.2422395
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents