Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1111/cgf.14117acmconferencesArticle/Chapter ViewAbstractPublication PagesscaConference Proceedingsconference-collections
research-article

Intuitive facial animation editing based on a generative RNN framework

Published: 22 November 2020 Publication History

Abstract

For the last decades, the concern of producing convincing facial animation has garnered great interest, that has only been accelerating with the recent explosion of 3D content in both entertainment and professional activities. The use of motion capture and retargeting has arguably become the dominant solution to address this demand. Yet, despite high level of quality and automation performance-based animation pipelines still require manual cleaning and editing to refine raw results, which is a time- and skill-demanding process. In this paper, we look to leverage machine learning to make facial animation editing faster and more accessible to non-experts. Inspired by recent image inpainting methods, we design a generative recurrent neural network that generates realistic motion into designated segments of an existing facial animation, optionally following user-provided guiding constraints. Our system handles different supervised or unsupervised editing scenarios such as motion filling during occlusions, expression corrections, semantic content modifications, and noise filtering. We demonstrate the usability of our system on several animation editing use cases.

Supplementary Material

MP4 File (a22-berson.mp4)

References

[1]
[AKA96] Arai K., Kurihara T., Anjyo K.-i.: Bilinear interpolation for facial expression and metamorphosis in real-time animation. The Visual Computer 12, 3 (1996), 105--116. 3
[2]
[ASK*12] Akhter I., Simon T., Khan S., Matthews I., Sheikh Y.: Bilinear spatiotemporal basis models. ACM Transactions on Graphics 31, 2 (Apr. 2012), 1--12. 2, 3
[3]
[ATL12] Anjyo K., Todo H., Lewis J.: A Practical Approach to Direct Manipulation Blendshapes. Journal of Graphics Tools 16, 3 (Aug. 2012), 160--176. 2
[4]
[BBKK17] Bütepage J., Black M., Kragic D., Kjellström H.: Deep representation learning for human motion prediction and classification. arXiv:1702.07486 [cs] (Apr. 2017). arXiv: 1702.07486. URL: http://arxiv.org/abs/1702.07486. 3
[5]
[BCS97] Bregler C., Covell M., Slaney M.: Video rewrite: Driving visual speech with audio. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques (1997), ACM Press/Addison-Wesley Publishing Co., pp. 353--360. 3
[6]
[BSBS19] Berson E., Soladié C., Barrielle V., Stoiber N.: A Robust Interactive Facial Animation Editing System. In Proceedings of the 12th Annual International Conference on Motion, Interaction, and Games (New York, NY, USA, 2019), MIG '19, ACM, pp. 26:1--26:10. event-place: Newcastle-upon-Tyne, United Kingdom. 2, 3, 5, 8, 10
[7]
[BW75] Burtnyk N., Wein M.: Computer animation of free form images. In Proceedings of the 2nd annual conference on Computer graphics and interactive techniques (1975), pp. 78--80. 3
[8]
[CE05] Chang Y.-J., Ezzat T.: Transferable videorealistic speech animation. In Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation (2005), ACM, pp. 143--151. 3
[9]
[CFP03] Cao Y., Faloutsos P., Pighin F.: Unsupervised Learning for Speech Motion Editing. Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation (2003), 225--231. 2
[10]
[CGZ17] Chi J., Gao S., Zhang C.: Interactive facial expression editing based on spatio-temporal coherency. The Visual Computer 33, 6-8 (2017), 981--991. 2
[11]
[CLK01] Choe B., Lee H., Ko H.-S.: Performance-Driven Muscle-Based Facial Animation. The Journal of Visualization and Computer Animation 12 (2001), 67--79. 3
[12]
[CO18] Cetinaslan O., Orvalho V.: Direct manipulation of blend-shapes using a sketch-based interface. In Proceedings of the 23rd International ACM Conference on 3D Web Technology - Web3D '18 (Poznań, Poland, 2018), ACM Press, pp. 1--10. 2
[13]
[COL15] Cetinaslan O., Orvalho V., Lewis J. P.: Sketch-Based Controllers for Blendshape Facial Animation. Eurographics (Short Papers) (2015), 25--28. 2
[14]
[DBB*18] Dinev D., Beeler T., Bradley D., Bächer M., Xu H., Kavan L.: User-Guided Lip Correction for Facial Performance Capture. Computer Graphics Forum 37, 8 (Dec. 2018), 93--101. URL: http://doi.wiley.com/10.1111/cgf.13515. 2, 3
[15]
[FGR*10] Fanelli G., Gall J., Romsdorfer H., Weise T., Van Gool L.: A 3-D Audio-Visual Corpus of Affective Communication. IEEE Transactions on Multimedia 12, 6 (Oct. 2010), 591--598. URL: http://ieeexplore.ieee.org/document/5571821/, 5
[16]
[FLFM15] Fragkiadaki K., Levine S., Felsen P., Malik J.: Recurrent network models for human dynamics. In Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 4346--4354. 3
[17]
[FTZ*19] Fried O., Tewari A., Zollhöfer M., Finkelstein A., Shechtman E., Goldman D. B., Genova K., Jin Z., Theobalt C., Agrawala M.: Text-based editing of talking-head video. ACM Transactions on Graphics 38, 4 (July 2019), 1--14. URL: https://dl.acm.org/doi/10.1145/3306346.3323028. 3
[18]
[GAA*17] Gulrajani I., Ahmed F., Arjovsky M., Dumoulin V., Courville A.: Improved Training of Wasserstein GANs. arXiv:1704.00028 [cs, stat] (Dec. 2017). arXiv: 1704.00028. URL: http://arxiv.org/abs/1704.00028. 5
[19]
[GPAM*14] Goodfellow I. J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y.: Generative Adversarial Networks. arXiv:1406.2661 [cs, stat] (June 2014). arXiv: 1406.2661. URL: http://arxiv.org/abs/1406.2661. 2
[20]
[GVR*14] Garrido P., Valgaerts L., Rehmsen O., Thormaehlen T., Perez P., Theobalt C.: Automatic Face Reenactment. 2014 IEEE Conference on Computer Vision and Pattern Recognition (June 2014), 4217--4224. arXiv: 1602.02651. URL: http://arxiv.org/abs/1602.02651, 3
[21]
[GVS*15] Garrido P., Valgaerts L., Sarmadi H., Steiner I., Varanasi K., Pérez P., Theobalt C.: VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track. Computer Graphics Forum 34, 2 (May 2015), 193--204. URL: http://doi.wiley.com/10.1111/cgf.12552. 3
[22]
[HHS*17] Habibie I., Holden D., Schwarz J., Yearsley J., Komura T., Saito J., Kusajima I., Zhao X., Choi M.-G., Hu R.: A Recurrent Variational Autoencoder for Human Motion Synthesis. IEEE Computer Graphics and Applications 37 (2017), 4. 3
[23]
[HSK16] Holden D., Saito J., Komura T.: A deep learning framework for character motion synthesis and editing. ACM Transactions on Graphics 35, 4 (July 2016), 1--11. 3, 8, 10
[24]
[HSKJ15] Holden D., Saito J., Komura T., Joyce T.: Learning motion manifolds with convolutional autoencoders. In SIGGRAPH Asia 2015 Technical Briefs (2015), ACM Press, pp. 1--4. URL:http://dl.acm.org/citation.cfm?doid=2820903.2820918, 3
[25]
[IZZE17] Isola P., Zhu J.-Y., Zhou T., Efros A. A.: Image-to-Image Translation with Conditional Adversarial Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Honolulu, HI, July 2017), IEEE, pp. 5967--5976. URL: http://ieeexplore.ieee.org/document/8100115/, 2
[26]
[JP19] Jo Y., Park J.: SC-FEGAN: Face Editing Generative Adversarial Network with User's Sketch and Color. arXiv:1902.06838 [cs] (Feb. 2019). arXiv: 1902.06838. URL: http://arxiv.org/abs/1902.06838. 2, 4, 5
[27]
[JTDP03] Joshi P., Tien W. C., Desbrun M., Pighin F.: Learning Controls for Blend Shape Based Realistic Facial Animation. SIGGRAPH/Eurographics Symposium on Computer Animation (2003), 187--192. 2
[28]
[JZSS16] Jain A., Zamir A. R., Savarese S., Saxena A.: Structural-RNN: Deep learning on spatio-temporal graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 5308--5317. 3
[29]
[KB84] Kochanek D. H., Bartels R. H.: Interpolating splines with local tension, continuity, and bias control. In Proceedings of the 11th annual conference on Computer graphics and interactive techniques (1984), pp. 33--41. 3
[30]
[KB14] Kingma D. P., Ba J.: Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs] (Dec. 2014). arXiv: 1412.6980. URL: http://arxiv.org/abs/1412.6980. 5
[31]
[KEZ*19] Kim H., Elgharib M., Zollhöfer M., Seidel H.-P., Beeler T., Richardt C., Theobalt C.: Neural Style-Preserving Visual Dubbing. ACM Transactions on Graphics 38, 6 (Nov. 2019), 1--13. arXiv: 1909.02518. 3, 4
[32]
[KGP02] Kovar L., Gleicher M., Pighin F.: Motion graphs. In ACM SIGGRAPH (2002), ACM, p. 482. 3
[33]
[KTC*18] Kim H., Theobalt C., Carrido P., Tewari A., Xu W., Thies J., Niessner M., Pérez P., Richardt C., Zollhöfer M.: Deep video portraits. ACM Transactions on Graphics 37, 4 (July 2018), 1--14. 3
[34]
[LA10] Lewis J. P., Anjyo K.-i.: Direct Manipulation Blendshapes. IEEE Computer Graphics and Applications 30, 4 (July 2010), 42--50. 2
[35]
[LAR*14] Lewis J. P., Anjyo K., Rhee T., Zhang M., Pighin F., Deng Z.: Practice and Theory of Blendshape Facial Models. Eurographics (State of the Art Reports) 1, 8 (2014), 2. 4
[36]
[LCXS09] Lau M., Chai J., Xu Y.-Q., Shum H.-Y.: Face poser: Interactive modeling of 3D facial expressions using facial priors. ACM Transactions on Graphics 29, 1 (Dec. 2009), 1--17. 2
[37]
[LD08] Li Q., Deng Z.: Orthogonal-Blendshape-Based Editing System for Facial Motion Capture Data. IEEE Computer Graphics and Applications 28, 6 (Nov. 2008), 76--82. URL: http://ieeexplore.ieee.org/document/4670103/, 3
[38]
[LLYY17] Li Y., Liu S., Yang J., Yang M.-H.: Generative Face Completion. arXiv:1704.05838 [cs] (Apr. 2017). arXiv: 1704.05838. URL: http://arxiv.org/abs/1704.05838. 2
[39]
[MBR17] Martinez J., Black M. J., Romero J.: On human motion prediction using recurrent neural networks. arXiv:1705.02445 [cs] (May 2017). arXiv: 1705.02445. URL: http://arxiv.org/abs/1705.02445. 3
[40]
[MKKY18] Miyato T., Kataoka T., Koyama M., Yoshida Y.: Spectral Normalization for Generative Adversarial Networks. arXiv:1802.05957 [cs, stat] (Feb. 2018). arXiv: 1802.05957. URL: http://arxiv.org/abs/1802.05957. 5
[41]
[MLD09] Ma X., Le B. H., Deng Z.: Style learning and transferring for facial animation editing. In Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2009), ACM, pp. 123--132. 3
[42]
[MO14] Mirza M., Osindero S.: Conditional Generative Adversarial Nets. arXiv:1411.1784 [cs, stat] (Nov. 2014). arXiv: 1411.1784. URL: http://arxiv.org/abs/1411.1784. 4
[43]
[MSM*17] McAuliffe M., Socolof M., Mihuc S., Wagner M., Sonderegger M.: Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi. In Interspeech 2017 (Aug. 2017), ISCA, pp. 498--502. 4
[44]
[Par72] Parke F. I.: Computer generated animation of faces. In Proceedings of the ACM annual conference - Volume 1 (Boston, Massachusetts, USA, Aug. 1972), ACM '72, Association for Computing Machinery, pp. 451--457. 2, 3
[45]
[RAY*16] Reed S., Akata Z., Yan X., Logeswaran L., Schiele B., Lee H.: Generative Adversarial Text to Image Synthesis. 33rd International Conference on Machine Learning (2016), 1060--1069. 2
[46]
[RGMN19] Ruiz A. H., Gall J., Moreno-Noguer F.: Human Motion Prediction via Spatio-Temporal Inpainting. arXiv:1812.05478 [cs] (Oct. 2019). arXiv: 1812.05478. URL: http://arxiv.org/abs/1812.05478. 3
[47]
[SLS*12] Seol Y., Lewis J. P., Seo J., Choi B., Anjyo K., Noh J.: Spacetime expression cloning for blendshapes. ACM Transactions on Graphics (TOG) 31, 2 (2012), 14. 2, 3, 9
[48]
[SSB08] Stoiber N., Seguier R., Breton G.: Automatic design of a control interface for a synthetic face. In Proceedingsc of the 13th international conference on Intelligent user interfaces - IUI '09 (Sanibel Island, Florida, USA, 2008), ACM Press, p. 207. 3
[49]
[SSK*11] Seol Y., Seo J., Kim P. H., Lewis J. P., Noh J.: Artist friendly facial animation retargeting. ACM Transactions on Graphics 30, 6 (Dec. 2011), 162. URL: http://dl.acm.org/citation.cfm?doid=2070781.2024196, 8
[50]
[SSKS17] Suwajanakorn S., Seitz S. M., Kemelmacher-Shlizerman I.: Synthesizing Obama: learning lip sync from audio. ACM Transactions on Graphics 36, 4 (July 2017), 1--13. 3
[51]
[SWQ*20] Song L., Wu W., Qian C., He R., Loy C. C.: Everybody's Talkin': Let Me Talk as You Want. arXiv:2001.05201 [cs] (Jan. 2020). arXiv: 2001.05201. URL: http://arxiv.org/abs/2001.05201. 4
[52]
[TDlTM11] Tena J. R., De la Torre F., Matthews I.: Interactive Region-based Linear 3D Face Models. In ACM SIGGRAPH 2011 Papers (New York, NY, USA, 2011), SIGGRAPH '11, ACM, pp. 76:1--76:10. event-place: Vancouver, British Columbia, Canada. 2
[53]
[TZS*16] Thies J., Zollhofer M., Stamminger M., Theobalt C., Nie\s sner M.: Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 2387--2395. 3
[54]
[WCX19] Wang Z., Chai J., Xia S.: Combining recurrent neural networks and adversarial training for human motion synthesis and control. IEEE transactions on visualization and computer graphics (2019). 3
[55]
[XWCL15] Xu B., Wang N., Chen T., Li M.: Empirical Evaluation of Rectified Activations in Convolutional Network. arXiv:1505.00853 [cs, stat] (Nov. 2015). arXiv: 1505.00853. URL: http://arxiv.org/abs/1505.00853. 5
[56]
[YLY*19] Yu J., Lin Z., Yang J., Shen X., Lu X., Huang T.: Free-Form Image Inpainting with Gated Convolution. arXiv:1806.03589 [cs] (Oct. 2019). arXiv: 1806.03589. URL: http://arxiv.org/abs/1806.03589. 2, 4, 5
[57]
[ZLB*20] Zhou Y., Lu J., Barnes C., Yang J., Xiang S., li H.: Generative Tweening: Long-term Inbetweening of 3D Human Motions. arXiv:2005.08891 [cs] (May 2020). arXiv: 2005.08891. URL: http://arxiv.org/abs/2005.08891. 3
[58]
[ZLH03] Zhang Q., Liu, Zicheng, Heung-Yeung Shum: Geometry-driven photorealistic facial expression synthesis. Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2003), 177--186. URL: http://ieeexplore.ieee.org/document/1541999/, 2
[59]
[ZSCS04] Zhang L., Snavely N., Curless B., Seitz S. M.: Spacetime Faces: High Resolution Capture for Modeling and Animation. ACM Trans. Graph. 23 (2004), 548--558. 2, 3
[60]
[ZvdP18] Zhang X., van de Panne M.: Data-driven autocompletion for keyframe animation. In Proceedings of the 11th Annual International Conference on Motion, Interaction, and Games (Limassol Cyprus, Nov. 2018), ACM, pp. 1--11. URL: https://dl.acm.org/doi/10.1145/3274247.3274502. 3

Cited By

View all
  • (2022)FDLS: A Deep Learning Approach to Production Quality, Controllable, and Retargetable Facial Performances.Proceedings of the 2022 Digital Production Symposium10.1145/3543664.3543672(1-9)Online publication date: 7-Aug-2022

Index Terms

  1. Intuitive facial animation editing based on a generative RNN framework

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SCA '20: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation
      October 2020
      252 pages

      Sponsors

      In-Cooperation

      • EUROGRAPHICS: The European Association for Computer Graphics

      Publisher

      Eurographics Association

      Goslar, Germany

      Publication History

      Published: 22 November 2020

      Check for updates

      Qualifiers

      • Research-article

      Conference

      SCA '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 183 of 487 submissions, 38%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)FDLS: A Deep Learning Approach to Production Quality, Controllable, and Retargetable Facial Performances.Proceedings of the 2022 Digital Production Symposium10.1145/3543664.3543672(1-9)Online publication date: 7-Aug-2022

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media