Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1028523.1028570acmconferencesArticle/Chapter ViewAbstractPublication PagesscaConference Proceedingsconference-collections
Article

Real-time speech motion synthesis from recorded motions

Published: 27 August 2004 Publication History

Abstract

Data-driven approaches have been successfully used for realistic visual speech synthesis. However, little effort has been devoted to real-time lip-synching for interactive applications. In particular, algorithms that are based on a graph of motions are notorious for their exponential complexity. In this paper, we present a greedy graph search algorithm that yields vastly superior performance and allows real-time motion synthesis from a large database of motions. The time complexity of the algorithm is linear with respect to the size of an input utterance. In our experiments, the synthesis time for an input sentence of average length is under a second.

Supplementary Material

JPG File (p345-cao.jpg)
MPEG File (p345-cao.mpeg)
Supplemental video

References

[1]
{AF02} Arikan O., Forsyth D. A.: Interactive motion generation from examples. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques (2002), ACM Press, pp. 483--490.
[2]
{BCS97} Bregler C., Covell M., Slaney M.: Video rewrite: driving visual speech with audio. In SIGGRAPH 97 Conference Proceedings (Aug. 1997), ACM SIGGRAPH, pp. 353--360.
[3]
{Bra99} Brand M.: Voice puppetry. In Proceedings of ACM SIGGRAPH 1999 (1999), ACM Press/Addison-Wesley Publishing Co., pp. 21--28.
[4]
{BS94} Brook N., Scott S.: Computer graphics animations of talking faces based on stochastic models. In International Symposium on Speech, Image Processing, and Neural Networkds (1994).
[5]
{Buh03} Buhmann M. D.: Radial Basis Functions: Theory and Implementations. Cambridge University Press, 2003.
[6]
{CM93} Cohen N., Massaro D. W.: Modeling coarticulation in synthetic visual speech. In Models and Techniques in Computer Animation (1993), Thalmann N. M., Thalmann D., (Eds.), Springer-Verlang, pp. 139--156.
[7]
{CPB*94} Cassell J., Pelachaud C., Badler N., Steedman M., Achorn B., Becket W., Douville B., Prevost S., Stone M.: Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In Proceedings of ACM SIGGRAPH 1994 (1994).
[8]
{CXH03} Chai J., Xiao J., Hodgins J.: Vision-based control of 3d facial animation. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2003), Eurographics Association, pp. 193--206.
[9]
{EGP02} Ezzat T., Geiger G., Poggio T.: Trainable videorealistic speech animation. In Proceedings of ACM SIGGRPAH 2002 (2002), ACM Press, pp. 388--398.
[10]
{Int} International Computer Science Institute, Berkeley, CA: Rasta software. www.icsi.berkeley.edu/Speech/rasta.html.
[11]
{KGP02} Kovar L., Gleicher M., Pighin F.: Motion graphs. In Proceedings of ACM SIGGRAPH 2002 (2002), ACM Press, pp. 473--482.
[12]
{KMG02} Kalberer G. A., Mueller P., Gool L. V.: Speech animation using viseme space. In Vision, Modeling, and Visualization VMV 2002 (2002), Akademische Verlags-gesellschaft Aka GmbH, Berlin, pp. 463--470.
[13]
{LCR*02} Lee J., Chai J., Reitsma P., Hodgins J., Pollard N.: Interactive control of avatars animated with human motion data, 2002.
[14]
{LTW95} Lee Y., Terzopoulos D., Waters K.: Realistic modeling for facial animation. In SIGGRAPH 95 Conference Proceedings (Aug. 1995), ACM SIGGRAPH, pp. 55--62.
[15]
{LWS02} Li Y., Wang T., Shum H.-Y.: Motion texture: A two-level statistical model for character motion synthesis. ACM Transactions on Graphics 21, 3 (July 2002), 465--472.
[16]
{MKT*98} Masuko T., Kobayashi T., Tamura M., Masubuchi J., K. Tokuda: Text-to-visual speech synthesis based on parameter generation from hmm. In ICASSP (1998).
[17]
{Pel91} Pelachaud C.: Realistic Face Animation for Speech. PhD thesis, University of Pennsylvania, 1991.
[18]
{SBCS04} Saisan P., Bissacco A., Chiuso A., Soatto S.: Modeling and synthesis of facial motion driven by speech. In European Conference on Computer Vision 2004 (2004), pp. 456--467.
[19]
{SG} Speech Group C. M. U.:. www.speech.cs.cmu.edu/festival.
[20]
{Wat87} Waters K.: A muscle model for animating three-dimensional facial expression. In SIGGRAPH 87 Conference Proceedings) (July 1987), vol. 21, ACM SIGGRAPH, pp. 17--24.

Cited By

View all
  • (2022)Avatar content creation system using scriptJournal of Digital Contents Society10.9728/dcs.2022.23.1.1123:1(11-19)Online publication date: 31-Jan-2022
  • (2021)A Novel Lip Synchronization Approach for Games and Virtual Environments2021 IEEE Conference on Games (CoG)10.1109/CoG52621.2021.9619128(1-9)Online publication date: 17-Aug-2021
  • (2018)State of the Art on Monocular 3D Face Reconstruction, Tracking, and ApplicationsComputer Graphics Forum10.1111/cgf.1338237:2(523-550)Online publication date: 22-May-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SCA '04: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation
August 2004
388 pages
ISBN:3905673142

Sponsors

Publisher

Eurographics Association

Goslar, Germany

Publication History

Published: 27 August 2004

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SCA04
Sponsor:
SCA04: Symposium on Computer Animation 2004
August 27 - 29, 2004
Grenoble, France

Acceptance Rates

Overall Acceptance Rate 183 of 487 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Avatar content creation system using scriptJournal of Digital Contents Society10.9728/dcs.2022.23.1.1123:1(11-19)Online publication date: 31-Jan-2022
  • (2021)A Novel Lip Synchronization Approach for Games and Virtual Environments2021 IEEE Conference on Games (CoG)10.1109/CoG52621.2021.9619128(1-9)Online publication date: 17-Aug-2021
  • (2018)State of the Art on Monocular 3D Face Reconstruction, Tracking, and ApplicationsComputer Graphics Forum10.1111/cgf.1338237:2(523-550)Online publication date: 22-May-2018
  • (2015)VDubComputer Graphics Forum10.1111/cgf.1255234:2(193-204)Online publication date: 1-May-2015
  • (2015)Audiovisual speech synthesisSpeech Communication10.1016/j.specom.2014.11.00166:C(182-217)Online publication date: 1-Feb-2015
  • (2014)Facial performance enhancement using dynamic shape space analysisACM Transactions on Graphics10.1145/254627633:2(1-12)Online publication date: 8-Apr-2014
  • (2014)Emotional facial expression transfer based on temporal restricted Boltzmann machinesSignal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific10.1109/APSIPA.2014.7041738(1-7)Online publication date: Dec-2014
  • (2013)A Practical and Configurable Lip Sync Method for GamesProceedings of Motion on Games10.1145/2522628.2522904(131-140)Online publication date: 6-Nov-2013
  • (2013)Visual Speech Synthesis Using a Variable-Order Switching Shared Gaussian Process Dynamical ModelIEEE Transactions on Multimedia10.1109/TMM.2013.227965915:8(1755-1768)Online publication date: 1-Dec-2013
  • (2012)Dynamic units of visual speechProceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation10.5555/2422356.2422395(275-284)Online publication date: 29-Jul-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media