Abstract
In this paper, we propose a technique for synthesizing photo-realistic facial animation from a text based on hidden Markov model (HMM) and deep neural network (DNN) with facial features for an interactive agent implementation. In the proposed technique, we use Animation Unit (AU) as facial features that express the state of each part of face and can be obtained by Kinect. We synthesize facial features from any text using the same framework as the HMM-based speech synthesis. Facial features are generated from HMM and are converted into intensities of pixels using DNN. We investigate appropriate conditions for training of HMM and DNN. Then, we perform an objective evaluation to compare the proposed technique with a conventional technique based on the principal component analysis (PCA).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anderson, R., Stenger, B., Wan, V., Cipolla, R.: Expressive visual text-to-speech using active appearance models. In: Proc. Computer Vision and Pattern Recognition (CVPR). pp. 3382{3389 (2013)
Bickmore, T.W., Utami, D., Matsuyama, R., Paasche-Orlow, M.K.: Improving Access to Online Health Information With Conversational Agents: A Randomized Controlled Experiment. Journal of Medical Internet Research 18(1) (2016)
Horiuchi, H., Saiki, S., Matsumoto, S., Nakamura, M.: Virtual Agent as a User Interface for Home Network System. International Journal of Software Innovation (IJSI) 3(2), 13{23 (2015)
Nose, T., Ito, A.: Analysis of spectral enhancement using global variance in HMM-based speech synthesis. In: Proc. INTERSPEECH. pp. 2917{2921 (2014)
Saito, Y., Nose, T., Shinozaki, T., Ito, A.: Facial image conversion based on trans-formation of Animation Units using DNN. IEICE technical report (in Japanese) 115(303), 23{28 (2015)
Sako, S., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: HMM-based text-to-audio-visual speech synthesis. In: INTERSPEECH. pp. 25{28 (2000)
Toda, T., Tokuda, K.: A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Trans. Information and Systems 90(5), 816{824 (2007)
Turk, M.A., Pentland, A.P.: Face recognition using eigenfaces. In: Proc. Computer Vision and Pattern Recognition (CVPR). pp. 586{591 (1991)
Yonezawa, T., Nakatani, Y., Yoshida, N., Kawamura, A.: Interactive browsing agent for the novice user with selective information in dialog. In: Proc. Soft Computing and Intelligent Systems (SCIS). pp. 731{734 (2014)
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proc. Eurospeech. pp. 2347{2350 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Sato, K., Nose, T., Ito, A. (2017). Synthesis of Photo-Realistic Facial Animation from Text Based on HMM and DNN with Animation Unit. In: Pan, JS., Tsai, PW., Huang, HC. (eds) Advances in Intelligent Information Hiding and Multimedia Signal Processing. Smart Innovation, Systems and Technologies, vol 64. Springer, Cham. https://doi.org/10.1007/978-3-319-50212-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-50212-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50211-3
Online ISBN: 978-3-319-50212-0
eBook Packages: EngineeringEngineering (R0)