research-article

A Virtual Character Generation and Animation System for E-Commerce Live Streaming

Authors:

Yinghui XuAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 1202 - 1211

https://doi.org/10.1145/3474085.3481547

Published: 17 October 2021 Publication History

Abstract

Virtual character has been widely adopted in many areas, such as virtual assistant, virtual customer service, robotics and etc. In this paper, we focus on its application in e-commerce live streaming. Particularly, we propose a virtual character generation and animation system that supports e-commerce live streaming with virtual characters as anchors. The system offers a virtual character face generation tool based on a weakly supervised 3D face reconstruction method. The method takes a single photo as input and generates a 3D face model with both similarity and aesthetics considered. It does not require 3D face annotation data due to the assist of differentiable neural rendering technique which seamlessly integrates rendering into a deep learning based 3D face reconstruction framework. Moreover, the system provides two animation approaches which support two different ways of live stream respectively. The first approach is based on real-time motion capture. An actor's performance is captured in real-time via a monocular camera, and then utilized for animating a virtual anchor. The second approach is text driven animation, in which the human-like animation is automatically generated based on a text script. The relationship between text script and animation is learned based on the training data which can be accumulated via the motion capture based animation. To our best knowledge, the presented work is the first sophisticated virtual character generation and animation system that is designed for e-commerce live streaming and actually deployed on an online shopping platform with millions of daily audiences.

References

[1]

Simon Alexanderson, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow. 2020. Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows. In Computer Graphics Forum, Vol. 39. Wiley Online Library, 487--496.

[2]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter V. Gehler, Javier Romero, and Michael J. Black. 2016. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part V (Lecture Notes in Computer Science, Vol. 9909), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer, 561--578. https://doi.org/10.1007/978-3-319-46454-1_34

[3]

Adnane Boukhayma, Rodrigo de Bem, and Philip HS Torr. 2019. 3d hand shape and pose from images in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10843--10852.

[4]

Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2019. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence, Vol. 43, 1 (2019), 172--186.

Digital Library

[5]

Sheng Chen, Yang Liu, Xiang Gao, and Zhen Han. 2018. Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices. In Chinese Conference on Biometric Recognition. Springer, 428--438.

[6]

Wenzheng Chen, Jun Gao, Huan Ling, Edward Smith, Jaakko Lehtinen, Alec Jacobson, and Sanja Fidler. 2019. Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer. In Advances In Neural Information Processing Systems.

Digital Library

[7]

Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2019. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0-0.

[8]

Yao Feng, Haiwen Feng, Michael J Black, and Timo Bolkart. 2020. Learning an animatable detailed 3D face model from in-the-wild images. arXiv preprint arXiv:2012.04012 (2020).

Digital Library

[9]

Zhen-Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, and Xiao-Jun Wu. 2018. Wing loss for robust facial landmark localisation with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2235--2245.

[10]

Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, and Junsong Yuan. 2019. 3d hand shape and pose estimation from a single rgb image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10833--10842.

[11]

Baris Gecer, Stylianos Ploumpis, Irene Kotsia, and Stefanos Zafeiriou. 2019. Ganfit: Generative adversarial network fitting for high fidelity 3d face reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1155--1164.

[12]

Yinghao Huang, Manuel Kaufmann, Emre Aksan, Michael J Black, Otmar Hilliges, and Gerard Pons-Moll. 2018. Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics (TOG), Vol. 37, 6 (2018), 1--15.

Digital Library

[13]

Ahmed Hussen Abdelaziz, Barry-John Theobald, Justin Binder, Gabriele Fanelli, Paul Dixon, Nick Apostoloff, Thibaut Weise, and Sachin Kajareker. 2019. Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models. In Int. Conf. on Multimodal Interaction. 220--225.

Digital Library

[14]

Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 36, 7 (2014), 1325--1339. https://doi.org/10.1109/TPAMI.2013.248

Digital Library

[15]

Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-End Recovery of Human Shape and Pose. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. IEEE Computer Society, 7122--7131. https://doi.org/10.1109/CVPR.2018.00744

[16]

Muhammed Kocabas, Nikos Athanasiou, and Michael J. Black. 2020. VIBE: Video Inference for Human Body Pose and Shape Estimation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. IEEE, 5252--5262. https://doi.org/10.1109/CVPR42600.2020.00530

[17]

Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, and Kostas Daniilidis. 2019. Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 2252--2261. https://doi.org/10.1109/ICCV.2019.00234

[18]

Taras Kucherenko, Dai Hasegawa, Gustav Eje Henter, Naoshi Kaneko, and Hedvig Kjellström. 2019. Analyzing input and output representations for speech-driven gesture generation. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. 97--104.

Digital Library

[19]

Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexandersson, Iolanda Leite, and Hedvig Kjellström. 2020. Gesticulator: A framework for semantically-aware speech-driven gesture generation. In Proceedings of the 2020 International Conference on Multimodal Interaction. 242--250.

Digital Library

[20]

Kevin Lin, Lijuan Wang, and Zicheng Liu. 2020. End-to-End Human Pose and Mesh Reconstruction with Transformers. arXiv preprint arXiv:2012.09760 (2020).

[21]

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollá r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V (Lecture Notes in Computer Science, Vol. 8693), David J. Fleet, Tomá s Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer, 740--755. https://doi.org/10.1007/978-3-319-10602-1_48

[22]

Shichen Liu, Tianye Li, Weikai Chen, and Hao Li. 2020. A General Differentiable Mesh Renderer for Image-based 3D Reasoning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).

[23]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), Vol. 34, 6 (Oct. 2015), 248:1--248:16.

Digital Library

[24]

Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision. In 2017 International Conference on 3D Vision, 3DV 2017, Qingdao, China, October 10-12, 2017. IEEE Computer Society, 506--516. https://doi.org/10.1109/3DV.2017.00064

[25]

Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2018. Ganerated hands for real-time 3d hand tracking from monocular rgb. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 49--59.

[26]

Richard A Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J Davison, Pushmeet Kohi, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. Kinectfusion: Real-time dense surface mapping and tracking. In 2011 10th IEEE international symposium on mixed and augmented reality. IEEE, 127--136.

Digital Library

[27]

Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. 2009. A 3D face model for pose and illumination invariant face recognition. In 2009 sixth IEEE international conference on advanced video and signal based surveillance. Ieee, 296--301.

Digital Library

[28]

Hai Xuan Pham, Yuting Wang, and Vladimir Pavlovic. 2018. End-to-end learning for 3d facial animation from speech. In ACM Int. Conf. on Multimodal Interaction. 361--365.

Digital Library

[29]

Javier Romero, Dimitrios Tzionas, and Michael J Black. 2017. Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics (ToG), Vol. 36, 6 (2017), 1--17.

Digital Library

[30]

Yu Rong, Takaaki Shiratori, and Hanbyul Joo. 2020. FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration. arXiv preprint arXiv:2008.08324 (2020).

[31]

Tianyang Shi, Yi Yuan, Changjie Fan, Zhengxia Zou, Zhenwei Shi, and Yong Liu. 2019. Face-to-parameter translation for game character auto-creation. Proceedings of the IEEE International Conference on Computer Vision, Vol. 2019-Octob (2019), 161--170. https://doi.org/10.1109/ICCV.2019.00025 arxiv: 1909.01064

[32]

Srinath Sridhar, Franziska Mueller, Michael Zollhöfer, Dan Casas, Antti Oulasvirta, and Christian Theobalt. 2016. Real-time joint tracking of a hand manipulating an object from rgb-d input. In European Conference on Computer Vision. Springer, 294--310.

[33]

Srinath Sridhar, Antti Oulasvirta, and Christian Theobalt. 2013. Interactive markerless articulated hand motion tracking using RGB and depth data. In Proceedings of the IEEE international conference on computer vision. 2456--2463.

Digital Library

[34]

Sarah Taylor, Taehwan Kim, Yisong Yue, Moshe Mahler, James Krahe, Anastasio Garcia Rodriguez, Jessica Hodgins, and Iain Matthews. 2017. A deep learning approach for generalized speech animation., Vol. 36, 4 (2017), 1--11.

Digital Library

[35]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).

Digital Library

[36]

Evangelos Ververas and Stefanos Zafeiriou. 2020. SliderGAN: Synthesizing Expressive Face Images by Sliding 3D Blendshape Parameters. International Journal of Computer Vision, Vol. 128, 10--11 (2020), 2629--2650. https://doi.org/10.1007/s11263-020-01338-7 arxiv: 1908.09638

[37]

Timo von Marcard, Roberto Henschel, Michael J Black, Bodo Rosenhahn, and Gerard Pons-Moll. 2018a. Recovering accurate 3d human pose in the wild using imus and a moving camera. In Proceedings of the European Conference on Computer Vision (ECCV). 601--617.

Digital Library

[38]

Timo von Marcard, Roberto Henschel, Michael J. Black, Bodo Rosenhahn, and Gerard Pons-Moll. 2018b. Recovering Accurate 3D Human Pose in the Wild Using IMUs and a Moving Camera. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part X (Lecture Notes in Computer Science, Vol. 11214), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer, 614--631. https://doi.org/10.1007/978-3-030-01249-6_37

[39]

Qinjie Xiao, Xiangjun Tang, You Wu, Leyang Jin, Yong Liang Yang, and Xiaogang Jin. 2020. Deep Shapely Portraits. MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia, Vol. 1, 1 (2020), 1800--1808. https://doi.org/10.1145/3394171.3413873

Digital Library

[40]

Youngwoo Yoon, Bok Cha, Joo-Haeng Lee, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2020. Speech gesture generation from the trimodal context of text, audio, and speaker identity. ACM Transactions on Graphics (TOG), Vol. 39, 6 (2020), 1--16.

Digital Library

[41]

Jiawei Zhang, Jianbo Jiao, Mingliang Chen, Liangqiong Qu, Xiaobin Xu, and Qingxiong Yang. 2016. 3d hand pose tracking and estimation using stereo matching. arXiv preprint arXiv:1610.07214 (2016).

[42]

Yuxiao Zhou, Marc Habermann, Weipeng Xu, Ikhsanul Habibie, Christian Theobalt, and Feng Xu. 2020. Monocular real-time hand shape and motion capture using multi-modal data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5346--5355.

[43]

Yang Zhou, Zhan Xu, Chris Landreth, Evangelos Kalogerakis, Subhransu Maji, and Karan Singh. 2018. Visemenet: Audio-driven animator-centric speech animation., Vol. 37, 4 (2018), 1--10.

Digital Library

[44]

Christian Zimmermann and Thomas Brox. 2017. Learning to estimate 3d hand pose from single rgb images. In Proceedings of the IEEE international conference on computer vision. 4903--4911.

Cited By

Peng YWang YLi JYang Q(2024)Impact of AI-Oriented Live-Streaming E-Commerce Service Failures on Consumer Disengagement—Empirical Evidence from ChinaJournal of Theoretical and Applied Electronic Commerce Research10.3390/jtaer1902007719:2(1580-1598)Online publication date: 17-Jun-2024
https://doi.org/10.3390/jtaer19020077
Sun XPelet JDai SMa Y(2023)The Effects of Trust, Perceived Risk, Innovativeness, and Deal Proneness on Consumers’ Purchasing Behavior in the Livestreaming Social Commerce ContextSustainability10.3390/su15231632015:23(16320)Online publication date: 26-Nov-2023
https://doi.org/10.3390/su152316320

Index Terms

A Virtual Character Generation and Animation System for E-Commerce Live Streaming
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Intelligent agents

Recommendations

Layered acting for character animation

We introduce an acting-based animation system for creating and editing character animation at interactive speeds. Our system requires minimal training, typically under an hour, and is well suited for rapidly prototyping and creating expressive motion. A ...
Layered acting for character animation
SIGGRAPH '03: ACM SIGGRAPH 2003 Papers

We introduce an acting-based animation system for creating and editing character animation at interactive speeds. Our system requires minimal training, typically under an hour, and is well suited for rapidly prototyping and creating expressive motion. A ...
Character Rigging and Advanced Animation: Bring Your Character to Life Using Autodesk 3ds Max

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
845
Total Downloads

Downloads (Last 12 months)124
Downloads (Last 6 weeks)18

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Peng YWang YLi JYang Q(2024)Impact of AI-Oriented Live-Streaming E-Commerce Service Failures on Consumer Disengagement—Empirical Evidence from ChinaJournal of Theoretical and Applied Electronic Commerce Research10.3390/jtaer1902007719:2(1580-1598)Online publication date: 17-Jun-2024
https://doi.org/10.3390/jtaer19020077
Sun XPelet JDai SMa Y(2023)The Effects of Trust, Perceived Risk, Innovativeness, and Deal Proneness on Consumers’ Purchasing Behavior in the Livestreaming Social Commerce ContextSustainability10.3390/su15231632015:23(16320)Online publication date: 26-Nov-2023
https://doi.org/10.3390/su152316320

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents