research-article

Intuitive facial animation editing based on a generative RNN framework

Authors:

Eloïse Berson,

Catherine Soladié,

Nicolas StoiberAuthors Info & Claims

SCA '20: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation

Article No.: 22, Pages 1 - 11

https://doi.org/10.1111/cgf.14117

Published: 22 November 2020 Publication History

Abstract

For the last decades, the concern of producing convincing facial animation has garnered great interest, that has only been accelerating with the recent explosion of 3D content in both entertainment and professional activities. The use of motion capture and retargeting has arguably become the dominant solution to address this demand. Yet, despite high level of quality and automation performance-based animation pipelines still require manual cleaning and editing to refine raw results, which is a time- and skill-demanding process. In this paper, we look to leverage machine learning to make facial animation editing faster and more accessible to non-experts. Inspired by recent image inpainting methods, we design a generative recurrent neural network that generates realistic motion into designated segments of an existing facial animation, optionally following user-provided guiding constraints. Our system handles different supervised or unsupervised editing scenarios such as motion filling during occlusions, expression corrections, semantic content modifications, and noise filtering. We demonstrate the usability of our system on several animation editing use cases.

Supplementary Material

MP4 File (a22-berson.mp4)

Download
12.85 MB

References

[1]

[AKA96] Arai K., Kurihara T., Anjyo K.-i.: Bilinear interpolation for facial expression and metamorphosis in real-time animation. The Visual Computer 12, 3 (1996), 105--116. 3

[2]

[ASK*12] Akhter I., Simon T., Khan S., Matthews I., Sheikh Y.: Bilinear spatiotemporal basis models. ACM Transactions on Graphics 31, 2 (Apr. 2012), 1--12. 2, 3

Digital Library

[3]

[ATL12] Anjyo K., Todo H., Lewis J.: A Practical Approach to Direct Manipulation Blendshapes. Journal of Graphics Tools 16, 3 (Aug. 2012), 160--176. 2

[4]

[BBKK17] Bütepage J., Black M., Kragic D., Kjellström H.: Deep representation learning for human motion prediction and classification. arXiv:1702.07486 [cs] (Apr. 2017). arXiv: 1702.07486. URL: http://arxiv.org/abs/1702.07486. 3

[5]

[BCS97] Bregler C., Covell M., Slaney M.: Video rewrite: Driving visual speech with audio. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques (1997), ACM Press/Addison-Wesley Publishing Co., pp. 353--360. 3

Digital Library

[6]

[BSBS19] Berson E., Soladié C., Barrielle V., Stoiber N.: A Robust Interactive Facial Animation Editing System. In Proceedings of the 12th Annual International Conference on Motion, Interaction, and Games (New York, NY, USA, 2019), MIG '19, ACM, pp. 26:1--26:10. event-place: Newcastle-upon-Tyne, United Kingdom. 2, 3, 5, 8, 10

Digital Library

[7]

[BW75] Burtnyk N., Wein M.: Computer animation of free form images. In Proceedings of the 2nd annual conference on Computer graphics and interactive techniques (1975), pp. 78--80. 3

Digital Library

[8]

[CE05] Chang Y.-J., Ezzat T.: Transferable videorealistic speech animation. In Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation (2005), ACM, pp. 143--151. 3

Digital Library

[9]

[CFP03] Cao Y., Faloutsos P., Pighin F.: Unsupervised Learning for Speech Motion Editing. Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation (2003), 225--231. 2

Digital Library

[10]

[CGZ17] Chi J., Gao S., Zhang C.: Interactive facial expression editing based on spatio-temporal coherency. The Visual Computer 33, 6-8 (2017), 981--991. 2

Digital Library

[11]

[CLK01] Choe B., Lee H., Ko H.-S.: Performance-Driven Muscle-Based Facial Animation. The Journal of Visualization and Computer Animation 12 (2001), 67--79. 3

[12]

[CO18] Cetinaslan O., Orvalho V.: Direct manipulation of blend-shapes using a sketch-based interface. In Proceedings of the 23rd International ACM Conference on 3D Web Technology - Web3D '18 (Poznań, Poland, 2018), ACM Press, pp. 1--10. 2

Digital Library

[13]

[COL15] Cetinaslan O., Orvalho V., Lewis J. P.: Sketch-Based Controllers for Blendshape Facial Animation. Eurographics (Short Papers) (2015), 25--28. 2

[14]

[DBB*18] Dinev D., Beeler T., Bradley D., Bächer M., Xu H., Kavan L.: User-Guided Lip Correction for Facial Performance Capture. Computer Graphics Forum 37, 8 (Dec. 2018), 93--101. URL: http://doi.wiley.com/10.1111/cgf.13515. 2, 3

[15]

[FGR*10] Fanelli G., Gall J., Romsdorfer H., Weise T., Van Gool L.: A 3-D Audio-Visual Corpus of Affective Communication. IEEE Transactions on Multimedia 12, 6 (Oct. 2010), 591--598. URL: http://ieeexplore.ieee.org/document/5571821/, 5

Digital Library

[16]

[FLFM15] Fragkiadaki K., Levine S., Felsen P., Malik J.: Recurrent network models for human dynamics. In Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 4346--4354. 3

Digital Library

[17]

[FTZ*19] Fried O., Tewari A., Zollhöfer M., Finkelstein A., Shechtman E., Goldman D. B., Genova K., Jin Z., Theobalt C., Agrawala M.: Text-based editing of talking-head video. ACM Transactions on Graphics 38, 4 (July 2019), 1--14. URL: https://dl.acm.org/doi/10.1145/3306346.3323028. 3

Digital Library

[18]

[GAA*17] Gulrajani I., Ahmed F., Arjovsky M., Dumoulin V., Courville A.: Improved Training of Wasserstein GANs. arXiv:1704.00028 [cs, stat] (Dec. 2017). arXiv: 1704.00028. URL: http://arxiv.org/abs/1704.00028. 5

Digital Library

[19]

[GPAM*14] Goodfellow I. J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y.: Generative Adversarial Networks. arXiv:1406.2661 [cs, stat] (June 2014). arXiv: 1406.2661. URL: http://arxiv.org/abs/1406.2661. 2

[20]

[GVR*14] Garrido P., Valgaerts L., Rehmsen O., Thormaehlen T., Perez P., Theobalt C.: Automatic Face Reenactment. 2014 IEEE Conference on Computer Vision and Pattern Recognition (June 2014), 4217--4224. arXiv: 1602.02651. URL: http://arxiv.org/abs/1602.02651, 3

Digital Library

[21]

[GVS*15] Garrido P., Valgaerts L., Sarmadi H., Steiner I., Varanasi K., Pérez P., Theobalt C.: VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track. Computer Graphics Forum 34, 2 (May 2015), 193--204. URL: http://doi.wiley.com/10.1111/cgf.12552. 3

Digital Library

[22]

[HHS*17] Habibie I., Holden D., Schwarz J., Yearsley J., Komura T., Saito J., Kusajima I., Zhao X., Choi M.-G., Hu R.: A Recurrent Variational Autoencoder for Human Motion Synthesis. IEEE Computer Graphics and Applications 37 (2017), 4. 3

[23]

[HSK16] Holden D., Saito J., Komura T.: A deep learning framework for character motion synthesis and editing. ACM Transactions on Graphics 35, 4 (July 2016), 1--11. 3, 8, 10

Digital Library

[24]

[HSKJ15] Holden D., Saito J., Komura T., Joyce T.: Learning motion manifolds with convolutional autoencoders. In SIGGRAPH Asia 2015 Technical Briefs (2015), ACM Press, pp. 1--4. URL:http://dl.acm.org/citation.cfm?doid=2820903.2820918, 3

Digital Library

[25]

[IZZE17] Isola P., Zhu J.-Y., Zhou T., Efros A. A.: Image-to-Image Translation with Conditional Adversarial Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Honolulu, HI, July 2017), IEEE, pp. 5967--5976. URL: http://ieeexplore.ieee.org/document/8100115/, 2

[26]

[JP19] Jo Y., Park J.: SC-FEGAN: Face Editing Generative Adversarial Network with User's Sketch and Color. arXiv:1902.06838 [cs] (Feb. 2019). arXiv: 1902.06838. URL: http://arxiv.org/abs/1902.06838. 2, 4, 5

[27]

[JTDP03] Joshi P., Tien W. C., Desbrun M., Pighin F.: Learning Controls for Blend Shape Based Realistic Facial Animation. SIGGRAPH/Eurographics Symposium on Computer Animation (2003), 187--192. 2

Digital Library

[28]

[JZSS16] Jain A., Zamir A. R., Savarese S., Saxena A.: Structural-RNN: Deep learning on spatio-temporal graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 5308--5317. 3

[29]

[KB84] Kochanek D. H., Bartels R. H.: Interpolating splines with local tension, continuity, and bias control. In Proceedings of the 11th annual conference on Computer graphics and interactive techniques (1984), pp. 33--41. 3

Digital Library

[30]

[KB14] Kingma D. P., Ba J.: Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs] (Dec. 2014). arXiv: 1412.6980. URL: http://arxiv.org/abs/1412.6980. 5

[31]

[KEZ*19] Kim H., Elgharib M., Zollhöfer M., Seidel H.-P., Beeler T., Richardt C., Theobalt C.: Neural Style-Preserving Visual Dubbing. ACM Transactions on Graphics 38, 6 (Nov. 2019), 1--13. arXiv: 1909.02518. 3, 4

Digital Library

[32]

[KGP02] Kovar L., Gleicher M., Pighin F.: Motion graphs. In ACM SIGGRAPH (2002), ACM, p. 482. 3

Digital Library

[33]

[KTC*18] Kim H., Theobalt C., Carrido P., Tewari A., Xu W., Thies J., Niessner M., Pérez P., Richardt C., Zollhöfer M.: Deep video portraits. ACM Transactions on Graphics 37, 4 (July 2018), 1--14. 3

Digital Library

[34]

[LA10] Lewis J. P., Anjyo K.-i.: Direct Manipulation Blendshapes. IEEE Computer Graphics and Applications 30, 4 (July 2010), 42--50. 2

Digital Library

[35]

[LAR*14] Lewis J. P., Anjyo K., Rhee T., Zhang M., Pighin F., Deng Z.: Practice and Theory of Blendshape Facial Models. Eurographics (State of the Art Reports) 1, 8 (2014), 2. 4

[36]

[LCXS09] Lau M., Chai J., Xu Y.-Q., Shum H.-Y.: Face poser: Interactive modeling of 3D facial expressions using facial priors. ACM Transactions on Graphics 29, 1 (Dec. 2009), 1--17. 2

Digital Library

[37]

[LD08] Li Q., Deng Z.: Orthogonal-Blendshape-Based Editing System for Facial Motion Capture Data. IEEE Computer Graphics and Applications 28, 6 (Nov. 2008), 76--82. URL: http://ieeexplore.ieee.org/document/4670103/, 3

Digital Library

[38]

[LLYY17] Li Y., Liu S., Yang J., Yang M.-H.: Generative Face Completion. arXiv:1704.05838 [cs] (Apr. 2017). arXiv: 1704.05838. URL: http://arxiv.org/abs/1704.05838. 2

[39]

[MBR17] Martinez J., Black M. J., Romero J.: On human motion prediction using recurrent neural networks. arXiv:1705.02445 [cs] (May 2017). arXiv: 1705.02445. URL: http://arxiv.org/abs/1705.02445. 3

[40]

[MKKY18] Miyato T., Kataoka T., Koyama M., Yoshida Y.: Spectral Normalization for Generative Adversarial Networks. arXiv:1802.05957 [cs, stat] (Feb. 2018). arXiv: 1802.05957. URL: http://arxiv.org/abs/1802.05957. 5

[41]

[MLD09] Ma X., Le B. H., Deng Z.: Style learning and transferring for facial animation editing. In Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2009), ACM, pp. 123--132. 3

Digital Library

[42]

[MO14] Mirza M., Osindero S.: Conditional Generative Adversarial Nets. arXiv:1411.1784 [cs, stat] (Nov. 2014). arXiv: 1411.1784. URL: http://arxiv.org/abs/1411.1784. 4

[43]

[MSM*17] McAuliffe M., Socolof M., Mihuc S., Wagner M., Sonderegger M.: Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi. In Interspeech 2017 (Aug. 2017), ISCA, pp. 498--502. 4

[44]

[Par72] Parke F. I.: Computer generated animation of faces. In Proceedings of the ACM annual conference - Volume 1 (Boston, Massachusetts, USA, Aug. 1972), ACM '72, Association for Computing Machinery, pp. 451--457. 2, 3

Digital Library

[45]

[RAY*16] Reed S., Akata Z., Yan X., Logeswaran L., Schiele B., Lee H.: Generative Adversarial Text to Image Synthesis. 33rd International Conference on Machine Learning (2016), 1060--1069. 2

Digital Library

[46]

[RGMN19] Ruiz A. H., Gall J., Moreno-Noguer F.: Human Motion Prediction via Spatio-Temporal Inpainting. arXiv:1812.05478 [cs] (Oct. 2019). arXiv: 1812.05478. URL: http://arxiv.org/abs/1812.05478. 3

[47]

[SLS*12] Seol Y., Lewis J. P., Seo J., Choi B., Anjyo K., Noh J.: Spacetime expression cloning for blendshapes. ACM Transactions on Graphics (TOG) 31, 2 (2012), 14. 2, 3, 9

Digital Library

[48]

[SSB08] Stoiber N., Seguier R., Breton G.: Automatic design of a control interface for a synthetic face. In Proceedingsc of the 13th international conference on Intelligent user interfaces - IUI '09 (Sanibel Island, Florida, USA, 2008), ACM Press, p. 207. 3

Digital Library

[49]

[SSK*11] Seol Y., Seo J., Kim P. H., Lewis J. P., Noh J.: Artist friendly facial animation retargeting. ACM Transactions on Graphics 30, 6 (Dec. 2011), 162. URL: http://dl.acm.org/citation.cfm?doid=2070781.2024196, 8

Digital Library

[50]

[SSKS17] Suwajanakorn S., Seitz S. M., Kemelmacher-Shlizerman I.: Synthesizing Obama: learning lip sync from audio. ACM Transactions on Graphics 36, 4 (July 2017), 1--13. 3

Digital Library

[51]

[SWQ*20] Song L., Wu W., Qian C., He R., Loy C. C.: Everybody's Talkin': Let Me Talk as You Want. arXiv:2001.05201 [cs] (Jan. 2020). arXiv: 2001.05201. URL: http://arxiv.org/abs/2001.05201. 4

[52]

[TDlTM11] Tena J. R., De la Torre F., Matthews I.: Interactive Region-based Linear 3D Face Models. In ACM SIGGRAPH 2011 Papers (New York, NY, USA, 2011), SIGGRAPH '11, ACM, pp. 76:1--76:10. event-place: Vancouver, British Columbia, Canada. 2

Digital Library

[53]

[TZS*16] Thies J., Zollhofer M., Stamminger M., Theobalt C., Nie\s sner M.: Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 2387--2395. 3

Digital Library

[54]

[WCX19] Wang Z., Chai J., Xia S.: Combining recurrent neural networks and adversarial training for human motion synthesis and control. IEEE transactions on visualization and computer graphics (2019). 3

[55]

[XWCL15] Xu B., Wang N., Chen T., Li M.: Empirical Evaluation of Rectified Activations in Convolutional Network. arXiv:1505.00853 [cs, stat] (Nov. 2015). arXiv: 1505.00853. URL: http://arxiv.org/abs/1505.00853. 5

[56]

[YLY*19] Yu J., Lin Z., Yang J., Shen X., Lu X., Huang T.: Free-Form Image Inpainting with Gated Convolution. arXiv:1806.03589 [cs] (Oct. 2019). arXiv: 1806.03589. URL: http://arxiv.org/abs/1806.03589. 2, 4, 5

[57]

[ZLB*20] Zhou Y., Lu J., Barnes C., Yang J., Xiang S., li H.: Generative Tweening: Long-term Inbetweening of 3D Human Motions. arXiv:2005.08891 [cs] (May 2020). arXiv: 2005.08891. URL: http://arxiv.org/abs/2005.08891. 3

[58]

[ZLH03] Zhang Q., Liu, Zicheng, Heung-Yeung Shum: Geometry-driven photorealistic facial expression synthesis. Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2003), 177--186. URL: http://ieeexplore.ieee.org/document/1541999/, 2

Digital Library

[59]

[ZSCS04] Zhang L., Snavely N., Curless B., Seitz S. M.: Spacetime Faces: High Resolution Capture for Modeling and Animation. ACM Trans. Graph. 23 (2004), 548--558. 2, 3

Digital Library

[60]

[ZvdP18] Zhang X., van de Panne M.: Data-driven autocompletion for keyframe animation. In Proceedings of the 11th Annual International Conference on Motion, Interaction, and Games (Limassol Cyprus, Nov. 2018), ACM, pp. 1--11. URL: https://dl.acm.org/doi/10.1145/3274247.3274502. 3

Digital Library

Cited By

Ma WGhifary MLewis JChoi BEom H(2022)FDLS: A Deep Learning Approach to Production Quality, Controllable, and Retargetable Facial Performances.Proceedings of the 2022 Digital Production Symposium10.1145/3543664.3543672(1-9)Online publication date: 7-Aug-2022
https://dl.acm.org/doi/10.1145/3543664.3543672

Index Terms

Intuitive facial animation editing based on a generative RNN framework
1. Computing methodologies
  1. Computer graphics
    1. Animation
      1. Motion processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Artist friendly facial animation retargeting

This paper presents a novel facial animation retargeting system that is carefully designed to support the animator's workflow. Observation and analysis of the animators' often preferred process of key-frame animation with blendshape models informed our ...
Interactive editing of performance-based facial animation
SA '19: SIGGRAPH Asia 2019 Technical Briefs

While performance-based facial animation efficiently produces realistic animation, it still needs additional editing after automatic solving and retargeting. We review why additional editing is required and present a set of interactive editing solutions ...
Artist friendly facial animation retargeting
SA '11: Proceedings of the 2011 SIGGRAPH Asia Conference

This paper presents a novel facial animation retargeting system that is carefully designed to support the animator's workflow. Observation and analysis of the animators' often preferred process of key-frame animation with blendshape models informed our ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SCA '20: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation

October 2020

252 pages

Conference Chairs:
David Levin
University of Toronto
,
Paul Kry
McGill University

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

In-Cooperation

EUROGRAPHICS: The European Association for Computer Graphics

Publisher

Eurographics Association

Goslar, Germany

Publication History

Published: 22 November 2020

Check for updates

Qualifiers

Research-article

Conference

SCA '20

Sponsor:

SIGGRAPH

SCA '20: The ACM SIGGRAPH / Eurographics Symposium on Computer Animation

October 6 - 9, 2020

Virtual Event, Canada

Acceptance Rates

Overall Acceptance Rate 183 of 487 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
57
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ma WGhifary MLewis JChoi BEom H(2022)FDLS: A Deep Learning Approach to Production Quality, Controllable, and Retargetable Facial Performances.Proceedings of the 2022 Digital Production Symposium10.1145/3543664.3543672(1-9)Online publication date: 7-Aug-2022
https://dl.acm.org/doi/10.1145/3543664.3543672

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents