Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3481547acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A Virtual Character Generation and Animation System for E-Commerce Live Streaming

Published: 17 October 2021 Publication History

Abstract

Virtual character has been widely adopted in many areas, such as virtual assistant, virtual customer service, robotics and etc. In this paper, we focus on its application in e-commerce live streaming. Particularly, we propose a virtual character generation and animation system that supports e-commerce live streaming with virtual characters as anchors. The system offers a virtual character face generation tool based on a weakly supervised 3D face reconstruction method. The method takes a single photo as input and generates a 3D face model with both similarity and aesthetics considered. It does not require 3D face annotation data due to the assist of differentiable neural rendering technique which seamlessly integrates rendering into a deep learning based 3D face reconstruction framework. Moreover, the system provides two animation approaches which support two different ways of live stream respectively. The first approach is based on real-time motion capture. An actor's performance is captured in real-time via a monocular camera, and then utilized for animating a virtual anchor. The second approach is text driven animation, in which the human-like animation is automatically generated based on a text script. The relationship between text script and animation is learned based on the training data which can be accumulated via the motion capture based animation. To our best knowledge, the presented work is the first sophisticated virtual character generation and animation system that is designed for e-commerce live streaming and actually deployed on an online shopping platform with millions of daily audiences.

References

[1]
Simon Alexanderson, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow. 2020. Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows. In Computer Graphics Forum, Vol. 39. Wiley Online Library, 487--496.
[2]
Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter V. Gehler, Javier Romero, and Michael J. Black. 2016. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part V (Lecture Notes in Computer Science, Vol. 9909), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer, 561--578. https://doi.org/10.1007/978-3-319-46454-1_34
[3]
Adnane Boukhayma, Rodrigo de Bem, and Philip HS Torr. 2019. 3d hand shape and pose from images in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10843--10852.
[4]
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2019. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence, Vol. 43, 1 (2019), 172--186.
[5]
Sheng Chen, Yang Liu, Xiang Gao, and Zhen Han. 2018. Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices. In Chinese Conference on Biometric Recognition. Springer, 428--438.
[6]
Wenzheng Chen, Jun Gao, Huan Ling, Edward Smith, Jaakko Lehtinen, Alec Jacobson, and Sanja Fidler. 2019. Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer. In Advances In Neural Information Processing Systems.
[7]
Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2019. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0-0.
[8]
Yao Feng, Haiwen Feng, Michael J Black, and Timo Bolkart. 2020. Learning an animatable detailed 3D face model from in-the-wild images. arXiv preprint arXiv:2012.04012 (2020).
[9]
Zhen-Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, and Xiao-Jun Wu. 2018. Wing loss for robust facial landmark localisation with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2235--2245.
[10]
Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, and Junsong Yuan. 2019. 3d hand shape and pose estimation from a single rgb image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10833--10842.
[11]
Baris Gecer, Stylianos Ploumpis, Irene Kotsia, and Stefanos Zafeiriou. 2019. Ganfit: Generative adversarial network fitting for high fidelity 3d face reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1155--1164.
[12]
Yinghao Huang, Manuel Kaufmann, Emre Aksan, Michael J Black, Otmar Hilliges, and Gerard Pons-Moll. 2018. Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics (TOG), Vol. 37, 6 (2018), 1--15.
[13]
Ahmed Hussen Abdelaziz, Barry-John Theobald, Justin Binder, Gabriele Fanelli, Paul Dixon, Nick Apostoloff, Thibaut Weise, and Sachin Kajareker. 2019. Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models. In Int. Conf. on Multimodal Interaction. 220--225.
[14]
Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 36, 7 (2014), 1325--1339. https://doi.org/10.1109/TPAMI.2013.248
[15]
Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-End Recovery of Human Shape and Pose. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. IEEE Computer Society, 7122--7131. https://doi.org/10.1109/CVPR.2018.00744
[16]
Muhammed Kocabas, Nikos Athanasiou, and Michael J. Black. 2020. VIBE: Video Inference for Human Body Pose and Shape Estimation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. IEEE, 5252--5262. https://doi.org/10.1109/CVPR42600.2020.00530
[17]
Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, and Kostas Daniilidis. 2019. Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 2252--2261. https://doi.org/10.1109/ICCV.2019.00234
[18]
Taras Kucherenko, Dai Hasegawa, Gustav Eje Henter, Naoshi Kaneko, and Hedvig Kjellström. 2019. Analyzing input and output representations for speech-driven gesture generation. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. 97--104.
[19]
Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexandersson, Iolanda Leite, and Hedvig Kjellström. 2020. Gesticulator: A framework for semantically-aware speech-driven gesture generation. In Proceedings of the 2020 International Conference on Multimodal Interaction. 242--250.
[20]
Kevin Lin, Lijuan Wang, and Zicheng Liu. 2020. End-to-End Human Pose and Mesh Reconstruction with Transformers. arXiv preprint arXiv:2012.09760 (2020).
[21]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollá r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V (Lecture Notes in Computer Science, Vol. 8693), David J. Fleet, Tomá s Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer, 740--755. https://doi.org/10.1007/978-3-319-10602-1_48
[22]
Shichen Liu, Tianye Li, Weikai Chen, and Hao Li. 2020. A General Differentiable Mesh Renderer for Image-based 3D Reasoning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).
[23]
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), Vol. 34, 6 (Oct. 2015), 248:1--248:16.
[24]
Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision. In 2017 International Conference on 3D Vision, 3DV 2017, Qingdao, China, October 10-12, 2017. IEEE Computer Society, 506--516. https://doi.org/10.1109/3DV.2017.00064
[25]
Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2018. Ganerated hands for real-time 3d hand tracking from monocular rgb. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 49--59.
[26]
Richard A Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J Davison, Pushmeet Kohi, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. Kinectfusion: Real-time dense surface mapping and tracking. In 2011 10th IEEE international symposium on mixed and augmented reality. IEEE, 127--136.
[27]
Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. 2009. A 3D face model for pose and illumination invariant face recognition. In 2009 sixth IEEE international conference on advanced video and signal based surveillance. Ieee, 296--301.
[28]
Hai Xuan Pham, Yuting Wang, and Vladimir Pavlovic. 2018. End-to-end learning for 3d facial animation from speech. In ACM Int. Conf. on Multimodal Interaction. 361--365.
[29]
Javier Romero, Dimitrios Tzionas, and Michael J Black. 2017. Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics (ToG), Vol. 36, 6 (2017), 1--17.
[30]
Yu Rong, Takaaki Shiratori, and Hanbyul Joo. 2020. FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration. arXiv preprint arXiv:2008.08324 (2020).
[31]
Tianyang Shi, Yi Yuan, Changjie Fan, Zhengxia Zou, Zhenwei Shi, and Yong Liu. 2019. Face-to-parameter translation for game character auto-creation. Proceedings of the IEEE International Conference on Computer Vision, Vol. 2019-Octob (2019), 161--170. https://doi.org/10.1109/ICCV.2019.00025 arxiv: 1909.01064
[32]
Srinath Sridhar, Franziska Mueller, Michael Zollhöfer, Dan Casas, Antti Oulasvirta, and Christian Theobalt. 2016. Real-time joint tracking of a hand manipulating an object from rgb-d input. In European Conference on Computer Vision. Springer, 294--310.
[33]
Srinath Sridhar, Antti Oulasvirta, and Christian Theobalt. 2013. Interactive markerless articulated hand motion tracking using RGB and depth data. In Proceedings of the IEEE international conference on computer vision. 2456--2463.
[34]
Sarah Taylor, Taehwan Kim, Yisong Yue, Moshe Mahler, James Krahe, Anastasio Garcia Rodriguez, Jessica Hodgins, and Iain Matthews. 2017. A deep learning approach for generalized speech animation., Vol. 36, 4 (2017), 1--11.
[35]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).
[36]
Evangelos Ververas and Stefanos Zafeiriou. 2020. SliderGAN: Synthesizing Expressive Face Images by Sliding 3D Blendshape Parameters. International Journal of Computer Vision, Vol. 128, 10--11 (2020), 2629--2650. https://doi.org/10.1007/s11263-020-01338-7 arxiv: 1908.09638
[37]
Timo von Marcard, Roberto Henschel, Michael J Black, Bodo Rosenhahn, and Gerard Pons-Moll. 2018a. Recovering accurate 3d human pose in the wild using imus and a moving camera. In Proceedings of the European Conference on Computer Vision (ECCV). 601--617.
[38]
Timo von Marcard, Roberto Henschel, Michael J. Black, Bodo Rosenhahn, and Gerard Pons-Moll. 2018b. Recovering Accurate 3D Human Pose in the Wild Using IMUs and a Moving Camera. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part X (Lecture Notes in Computer Science, Vol. 11214), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer, 614--631. https://doi.org/10.1007/978-3-030-01249-6_37
[39]
Qinjie Xiao, Xiangjun Tang, You Wu, Leyang Jin, Yong Liang Yang, and Xiaogang Jin. 2020. Deep Shapely Portraits. MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia, Vol. 1, 1 (2020), 1800--1808. https://doi.org/10.1145/3394171.3413873
[40]
Youngwoo Yoon, Bok Cha, Joo-Haeng Lee, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2020. Speech gesture generation from the trimodal context of text, audio, and speaker identity. ACM Transactions on Graphics (TOG), Vol. 39, 6 (2020), 1--16.
[41]
Jiawei Zhang, Jianbo Jiao, Mingliang Chen, Liangqiong Qu, Xiaobin Xu, and Qingxiong Yang. 2016. 3d hand pose tracking and estimation using stereo matching. arXiv preprint arXiv:1610.07214 (2016).
[42]
Yuxiao Zhou, Marc Habermann, Weipeng Xu, Ikhsanul Habibie, Christian Theobalt, and Feng Xu. 2020. Monocular real-time hand shape and motion capture using multi-modal data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5346--5355.
[43]
Yang Zhou, Zhan Xu, Chris Landreth, Evangelos Kalogerakis, Subhransu Maji, and Karan Singh. 2018. Visemenet: Audio-driven animator-centric speech animation., Vol. 37, 4 (2018), 1--10.
[44]
Christian Zimmermann and Thomas Brox. 2017. Learning to estimate 3d hand pose from single rgb images. In Proceedings of the IEEE international conference on computer vision. 4903--4911.

Cited By

View all
  • (2024)Impact of AI-Oriented Live-Streaming E-Commerce Service Failures on Consumer Disengagement—Empirical Evidence from ChinaJournal of Theoretical and Applied Electronic Commerce Research10.3390/jtaer1902007719:2(1580-1598)Online publication date: 17-Jun-2024
  • (2023)The Effects of Trust, Perceived Risk, Innovativeness, and Deal Proneness on Consumers’ Purchasing Behavior in the Livestreaming Social Commerce ContextSustainability10.3390/su15231632015:23(16320)Online publication date: 26-Nov-2023

Index Terms

  1. A Virtual Character Generation and Animation System for E-Commerce Live Streaming

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '21: Proceedings of the 29th ACM International Conference on Multimedia
    October 2021
    5796 pages
    ISBN:9781450386517
    DOI:10.1145/3474085
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. motion-capture-to-animation
    2. photo-to-avatar
    3. text-to-animation

    Qualifiers

    • Research-article

    Conference

    MM '21
    Sponsor:
    MM '21: ACM Multimedia Conference
    October 20 - 24, 2021
    Virtual Event, China

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)124
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Impact of AI-Oriented Live-Streaming E-Commerce Service Failures on Consumer Disengagement—Empirical Evidence from ChinaJournal of Theoretical and Applied Electronic Commerce Research10.3390/jtaer1902007719:2(1580-1598)Online publication date: 17-Jun-2024
    • (2023)The Effects of Trust, Perceived Risk, Innovativeness, and Deal Proneness on Consumers’ Purchasing Behavior in the Livestreaming Social Commerce ContextSustainability10.3390/su15231632015:23(16320)Online publication date: 26-Nov-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media