Development of Facial Expressions Generator For Emotion Expressive Humanoid Robot
Development of Facial Expressions Generator For Emotion Expressive Humanoid Robot
Development of Facial Expressions Generator For Emotion Expressive Humanoid Robot
Abstract— Human-robot communication is a very important patterns: Kismet [7] was the first robot to be based on a
aspect in the field of Humanoid Robotics. Non-verbal model that could blend emotions; this kind of approach has
communication is one of the components that make interaction been also attempted, regarding full body poses, on the robot
natural. However, despite the most recent efforts, robots still Nao [8]. We want to pursue that direction.
can show only limited expression capabilities. The purpose of
this work is to create a facial expression generator that can be In emotionally responsive interaction, human likeness
applied to the new 24 DoF head of the humanoid robot can be improved if emotions are not limited to the six typical
KOBIAN-R. In this paper, we present a system based on basic ones (fear, anger, disgust, happiness, sadness, and
relevant studies of human communication and facial anatomy, surprise), but rather be produced by merging several
adapted to the specific robotic face. It makes use of polynomial parameters. Following an even broader concept, it would be
classifiers and is able to produce over 600 thousands of facial interesting including, in such system, moods and intentions
cues combinations, together with appropriate neck movement. that are not strictly defined as emotions, but that may follow
Results showed that the recognition rate of expressions one another during a conversation.
produced by this system is comparable to the rate of
recognition of the most common facial expressions. We Therefore, as a first step within a bigger scope, in this
conclude that our system can successfully improve the study we present the development of a system that generates
communication capabilities of the robot KOBIAN-R, and that facial expressions for KOBIAN-R, selecting an appropriate
there is potential for using it to implement more complex combination of facial cues depending on inner feelings, thus
interaction. resulting in a more natural and less hardwired way of
displaying emotions as well as communication acts. We
Keywords: Facial Expressions; Human-Robot Interaction; applied this system to both a real robotic face and to a virtual
Emotion Recognition one. While the correspondence to facial cues is specific for
KOBIAN-R, the emotions model that lays behind can
I. INTRODUCTION potentially be used on other robots.
Communication between humans and robots is one of the The rest of the paper is organized as follows: in section II
most important aspects in the field of Humanoid Robotics. we explain the hardware that was used and the conceptual
As humans, we communicate through complex languages, models; in section III we present the results of the survey and
and use different types of nonverbal cues, such as haptics, mention future works; in section IV we draw conclusions.
kinesics, proxemics and paralanguage [1]. For each of these
aspects, there are different branches of research. Therefore,
the path to the achievement of natural communication is long
and involves interdisciplinary components, such as facial
recognition, speech recognition, linguistics, mental
modelling, emotion expression and so on. Z
1400
However, expression capabilities are usually limited, and Figure 1. Emotion expression humanoid robot KOBIAN
the small number of pre-defined patterns is one of the reason
why humanoid robots are still unable to show a natural
interaction. There is a need of extending the rigid concept of
304
As said, a set of possible configurations of motor angles encounter would be useful in a full body dialogue system, at
was created for each facial part. We tried to maximize the least certainty, power relationship and affective state are
number of configurations while minimizing the number of useful for our purpose, which is limited to what can be
used variables. Each motor angle configuration will be conveyed only through facial expressions and neck
described by those robotic cues variables. In the case of movement. For example, there is an obvious link between
eyebrows, the 3 cues used are exactly corresponding to AU1, power relationship and movement of the head and eyes:
2, and 4 (Fig. 3); but in the case of the mouth, in which AUs when feeling superior to the person in front, usually the head
are present in significant number, AUs have been reduced to tends to move upwards and eyes tend to look down [18]. The
8 and then paired (one the opposite of the other, such as opposite is also true for feeling inferior. Degree of certainty
opening/closing lips), so that just 4 variables are enough to also influences facial cues [18].
represent robotic cues of the mouth. This simplification
makes classifiers learning problems easier. For the affective state, we need to use or make an
emotion model that’s possibly more extensive than existing
Once each facial part has its set of configurations 2D or 3D models (such as in [5], [7] and [15]), but at the
defined, this resulting “alphabet of non-lexical words” needs same time enough schematized to easily train a classifier.
a meaning to be assigned to each configuration. In these Plutchik’s wheel [23], satisfies these requirements. In that
regards, the studies of I. Poggi [16], [17], [18] have been model, the 8 basic emotions are opposite in pairs to each
used to find these correspondences, together with AUs as other. This fact reduces the variables we need to use to 4.
defined by Ekman for each emotion. However, applying
strictly Ekman’s indications [19] is not feasible because of In our model (Fig. 4), joining the 4 Plutchik’s parameters
the difference with human face, and moreover, using a to the 2 additional ones that describe the performative, we
“categorical model”, as defined by Smith and Scott [20] got a space that represents communication acts. The 6
would not be appropriate for our goal. basic parameters are: Mood, Stance, Temperament,
Expectation, Certainty and Power relationship. While in
Conversely, we considered using a “componential Plutchik’s model secondary emotions can be extrapolated
approach” to the meaning of facial expressions [20], [21]. In from two adjacent branches of the wheel, in our model is
this case, each cue is a component that has an exact meaning possible to span through a huge number of emotions, which
that influences the overall meaning of the expression. Smith can be in between any two or more of the basic emotions.
and Scott proposed a table [20] that links meanings to Extending this concept to all 6 parameters, each
hypothesized individual facial actions. For instance, there is communication act is a point in the space.
a link between surprise and the action of raising eyebrows, or
between pleasantness and smiling. This could be a solid We composed an arbitrarily long list of composite
foundation for deciding the way to decompose expressions, emotions/communication acts, borrowing labels not only
but the meanings and facial actions they propose is not from Plutchik’s secondary emotions, but also from examples
enough to cover the wide range of expressions we are in [18] and from HUMAINE Emotion Annotation and
considering. Therefore, as the relationships between cues and Representation Language (EARL) [24]. Numeric values have
meaning is sometimes obscure, we decided to rely on 6 been assigned following those sources, where possible.
classifiers to map this correspondence, training them with a
table similar to Smith and Scott's, expanded using the above D. Mathematical implementation
mentioned sources ([13], [18]). Degree 3 polynomial For combining together these two models, the
classifiers have been used. They are based on Fisher’s linear mathematical model itself has been defined as follows.
classifier and minimize the errors in the least square sense.
Polynomial features are added to the input dataset:
monomials as well as combinations of 2nd order terms are
constructed and then classified. This solution has been
chosen because, compared to neural networks, polynomial
classifiers can map the data with error close to zero on the
training set.
305
Data was divided into 3 sets (shown in Fig. 8): the first is
IE, a collection of 6 facial expressions including Plutchik’s
secondary emotions (standing in between two primary
“leaves”) and Plutchik’s primary emotions that have not been
used for training the classifiers. All of them have been
generated automatically by using input values coherently.
For instance, we used, among others, Terror (0, 0, 99, 0, 0,
0), Fear (0, 0, 66, 0, 0, 0) and Surprise (0, 0, 0, 66, 0, 0) for
training. In the survey, Awe (0, 0, 66, 66, 0, 0) and
Apprehension (0, 0, 33, 0, 0, 0) are evaluated.
The set IC is a collection of 6 labelled facial expressions
generated automatically by using input values that are not
extracted from Plutchik’s model. The resulting set includes
communication acts that are not strictly emotions, using all 6
parameters of the extended model.
The last set, IR, a collection of 6 unlabelled facial
expressions produced by random inputs. Values in FR, the
subset of F that produces IR, are randomised between −99
and 99, and then only values bigger than the threshold (1) are
taken, because we considered noise random values whose
absolute values are smaller than the average.
Figure 6. Generation of eyebrows (EB) motor angle values from a sample
input with random values. A 2-way arrow indicates bijective f ij f ij > AVG j ( f ij )
f ij = (1)
f ij ≤ AVG j ( f ij )
correspondence through a table.
0
There are 6 classifiers, one for each part of the face, plus The index i refers to the i-th expression within the set; the
one for the neck. They are given vectors f as input. The set F index j refers to the parameter of the single input vector. This
containing all the fi is a limited subset of . Each fi is results in input vectors that contain 2 to 4 active parameters.
composed by the above mentioned 6 parameters fi1,…fi6, Without any label, these expressions classification and
where −99 ≤ fij ≤ 99. These bounds were chosen because display are supposed to produce ambiguous expressions,
they are divisible by 3, as each emotion of Plutchik’s model difficult to judge, but possibly more interesting to study than
has 3 degrees of intensity. the other sets.
The classifiers are trained using a set of 25 expressions: Each set contains exactly 6 expressions in order to be
one for neutral, plus 4 vectors for each parameters: 2 degrees able to compare their recognition results with other
of intensity for each direction, as in Fig. 5. contributions in the fields, usually based on the 6 basic
The outputs of the classifiers are the robotic cues expressions plus the neutral expression, thus giving 6 answer
variables mentioned in part B. We call these outputs p. choices in the survey. The sets were shown in the following
Values of their components range from 0 to 1 or from -1 to 1, order: IE, IC, IR. The inner order of the faces was randomized.
depending on the facial part. For each of the facial parts there The final question “How do you rate the expressiveness of
is a set C = c1,…,cn of possible configurations. Each the robot”, had to be answered through a 4-points Likert
configuration is defined in the same way as the vectors p, scale, spanning from “Very inexpressive” to “Very
with real values ranging from 0 to 1 or from −1 to 1. expressive”.
Through the use of the 1-nearest neighbour algorithm, we Expressions in IE and IC were evaluated choosing only
find, for each facial part, c*, the vector that is closest to p, one out of 6 labels. For IR the assessment was more
among the possible configurations. From this point, we can complicated. As it is impossible to label the expressions,
get the correspondent vector s* of simulator parameters and there is no way of evaluating clearly if recognition is correct
the vector m* containing real motor angles through lookup or not. For this reason, assessment was done in the following
tables. Fig. 6 shows the whole process of generation of these way:
values. • Each expression fR of IR is evaluated through 5-
points Likert scales, one for each of the 6 parameters.
III. RESULTS AND DISCUSSIONS • The Euclidean distance of the resulting vector of
evaluated parameters fR1,…,fR6 from the input values fi1,…,fi6
A. Experimental setup that produced the facial expression gives a numerical
The assessment of this system was done through a web measure of the accuracy of the recognition.
survey, where 47 subjects differing from gender (male: 28; • The distance is then normalized dividing by the
female: 19), age (average: 26.7) and nationality (11 norm n, defined as:
Japanese, 10 Italians, the rest from other countries) were n = (99,99,99,99,99,99) * 2 (2)
asked to evaluate expressions, comparing to the adjacent
• The same procedure is followed for the set IC, so
neutral expression (Fig. 7).
that results can be compared.
306
B. Results analysis
Data shown in Table III prove that the average
recognition rate of set IE, 68.8%, is not much different from
other related studies done using basic expressions, such as
the results shown in [15], which span from 45% to 84%.
Particularly significant is the comparison with the evaluation
of the 6 basic facial expressions with KOBIAN-R’s head,
reported in [11], which had an average rate of 68.6%, and
with the old head in [6], which had lower recognition rate.
The second set has a lower average (46.4%), also due to
especially low score obtained in two specific expressions
(Gratitude and Pity) that are confused each other.
Exchanging the two labels, rate would increase, respectively,
to 56.5% and 45.7%. This would bring the average of the
whole set to 55.8%. We believe that this set’s expressions,
being not just emotions but rather communication acts, could
have further increased the recognition rate through the use of
gestures. Figure 7. A sample of the evaluation form, showing both emotion /
communication act selection and evaluation of parameters through Likert
One more interesting aspect is the different scores scales. Neutral expression is on the left; the picture to be judged is on the
obtained dividing subjects by country. Specifically, Japan right
seems to have the lowest average. For this reason we
considered developing expression sets specific for Japan.
Expression-specific differences are also present. The most
noticeable case is the expression of Awe: 100% recognition
by Italians against an average of 36.4% by Asians (China,
Japan, Korea and Indonesia). In a West/East comparison,
analysis through t-test on the first two sets show that there is
a gap in recognition (p value: 0.026). The continuation of
this work is reported in [25].
In Table V, results indicate that there is no significant
difference also between how unlabelled expressions are
evaluated in terms of basic parameters, compared to labelled
ones. In order to understand if in absolute terms, 0.22 is a
good or a bad value, it can be argued that answering
randomly, results would span from 0.25 to 0.5. The two new
parameters introduced in our model were evaluated
separately: 0.073 for Certainty and 0.198 for Power Figure 8. The three sets that have been evaluated. In the top row, from left
to right: Apprehension, Annoyance, Love, Awe, Remorse, Hope. In the
Relationship show that they are meaningful additions to the middle row: Relief, Malice, Disbelief, Gratitude, Pity, Rebuke. In the
model. Further investigation on the emotional model bottom row, six expressions produced by random inputs
parameters through ANOVA confirms the null-hypothesis (F
critical value > F) and legitimates the use of the extended
model. TABLE IV. LABELLED EXPRESSIONS EVALUATION RESULTS BY
NATIONALITY
Finally, in the last question "How do you rate the
expressiveness of the robot", the average score was 3.56, a Number of Average
Continent
participants recognition rate
value quite close to "Very expressive".
Europe / America / Oceania 25 67.0%
Asia 22 47.3%
TABLE III. LABELLED EXPRESSIONS EVALUATION RESULTS
Average Average TABLE V. UNLABELLED EXPRESSIONS EVALUATION RESULTS
Expression recognition Expression recognition
Average normalised Standard
rate rate Set
distance deviation
Set IE Set IC
IC 0.217 0.065
Apprehension 56.5% Relief 50.0%
IR 0.222 0.045
Annoyance 82.6% Malice 34.8%
Love 73.9% Disbelief 67.4%
Awe 56.5% Gratitude 19.6% In conclusion, the presented approach, using continuous
Remorse 73.9% Pity 26.1%
mixtures of expression, provides the advantage that more
fine grained differences between the different expressions
Hope 69.6% Rebuke 80.4%
can be realised. However, judging these complex expressions
can be tricky. We believe that the results obtained so far can
307
prove that the system is working as intended, as users are [6] M. Zecca, N. Endo, S. Momoki, K. Itoh, A. Takanishi, “Design of the
"reading" in KOBIAN-R's face its feelings and humanoid robot KOBIAN - preliminary analysis of facial and whole
body emotion expression capabilities”, in: 8th IEEERAS International
communication intentions most of the times. Conference on Humanoid Robots: pp. 487-492, 2008
[7] C. Breazeal, “Emotion and Sociable Humanoid Robots”, International
IV. CONCLUSIONS Journal of Human Computer Interaction 59: pp 119-155, 2002
[8] A. Beck, A. Hiolle, A. Mazel, L. Cañamero, “Interpretation of
We have proposed a system that generates facial Emotional Body Language Displayed by Robots”, Proceedings of the
expressions for the humanoid robot KOBIAN-R, choosing a 3rd International workshop on Affective interaction in natural
combination of facial cues, rather than using predefined environments, 2010
patterns for reach emotion. The parameters involved in the [9] K. Itoh, H. Miwa, M. Zecca, H. Takanobu, S. Roccella, M. C.
generation are taken from a model that describes emotions Carrozza, P. Dario, A. Takanishi, “Mechanical Design of Emotion
and communication acts of the robot. This model is based on Expression Humanoid Robot WE-4RII” in: 16th CISM-IFToMM
Symposium on Robot Design, Dynamics and Control, pp. 255-262,
relevant studies on human anatomy and psychology. The 2006
generator system makes use of polynomial classifiers. [10] K. Hashimoto, Y. Takezaki, K. Hattori, H. Kondo, T. Takashima, H.
We have evaluated this system by survey, using photos O. Lim, A. Takanishi, “A Study of Function of the Human’s Foot
Arch Structure Using Biped Humanoid Robot,” in: IEEE/RSJ
of expressions. Results revealed that human users can International Conference on Intelligent Robots and Systems, pp.
effectively read in the robot’s face the meaning of the 2206-2211, 2010
expression, in a way that is comparable to the recognition of [11] Kishi, T. et al (2012) Development of Expressive Robotic Head for
the most basic facial expressions. This confirms that this Bipedal Humanoid Robot with Wide Moveable Range of Facial Parts
generator can represent a step forward towards Human- and Facial Color. Proceedings of 19th CISMIFToMM Symposium on
Robot Interaction. Robot Design, Dynamics, and Control (in press)
[12] P. Ekman, W.V. Friesen, J.C. Hager, “The Facial Action Coding
Results obtained so far encourage us to expand this System”, second ed. Weidenfeld & Nicolson, London, 2002
system following several directions. One possible expansion [13] J.C. Hager, P. Ekman, “The Inner and Outer Meanings of Facial
is to link the communication act model parameters to a Expressions”, in: J. T. Cacioppo & R. E. Petty (Eds.), Social
dialogue system: in such case, during a conversation, the Psychophysiology: A Sourcebook, New York: Guilford, 1983
mood of the robot would dynamically change, and the degree [14] S. Marcos, J. Gomez-Garcia-Bermejo, E. Zalama, “A realistic, virtual
head for human-computer interaction”, Interacting with Computers
of certainty, as well as the emotion, could be read on the 22: pp. 176-192, 2010
robot’s face. Expansion to a full body system could also be
[15] J. Saldien, K. Goris, B. Vanderborght, J. Vanderfaeillie, D. Lefeber,
considered. The facial expression generator itself can be “Expressing Emotions with the Social Robot Probo”, International
improved through the ability of generating asymmetrical Journal of Social Robotics, vol 2: pp. 377-389, 2010
expressions. Finally, the impact of human subjects’ [16] I. Poggi, “The lexicon of the conductor’s face”, in: P. McKevitt (Ed.)
nationality provides further insights on recognition rates. In CSNLP-8 (Cognitive Science and Natural Language Processing),
case of successful developments, this system, with a different Proceedings of the Workshop on "Language, vision and music": pp.
facial cues mapping, may be applied to a wider range of 67-54, 1999
humanoid robots. [17] I. Poggi, “Towards the Alphabet and the Lexicon of gesture, gaze and
touch”, in: Multimodality of Human Communication. Theories,
problems and applications, Virtual Symposium edited by P. Bouissac,
ACKNOWLEDGMENT 2001-2002
[18] I. Poggi, “Le parole del corpo” (in Italian). Carrocci, Roma, 2006
This study was conducted as part of the Research
[19] H. Kobayashi, Y. Ichikawa, M. Senda, T. Shiba, “Toward rich facial
Institute for Science and Engineering, Waseda University, expression by face robot”, IEEE International Symposium on
and as part of the humanoid project at the Humanoid Micromechatronics and Human Science, 2002.
Robotics Institute, Waseda University. [20] C.A. Smith, H.S. Scott, “A componential approach to the meaning of
facial expressions”, in J. Russell & J. Fernandez-Dols (Eds.), The
Psychololgy of Facial Expression, Cambridge University Press,
REFERENCES Cambridge, UK, pp. 229-254, 1997
[21] D.L. Bimler, G.V. Paramei, “Facial-Expression Affective Attributes
[1] M. Knapp, “Essentials of Nonverbal Communication”, Holt, Rinehart and their Configural Correlates: Components and Categories”, The
and Winston, New York, 1980. Spanish Journal of Psychology, vol.9, no. 1, pp. 19-31, 2009
[2] A. Mehrabian, M. Wiener, “Decoding of inconsistent [22] I. Poggi, C. Pelachaud, “Performative Facial Expressions in Animated
communications”, Journal of personality and social psychology, vol. Faces”, in: J. Cassell, J. Sullivan, S. Prevost, E. Churchill (Eds.),
6, no. 1, pp. 109-114, 1967 Embodied Conversational Agents, Cambridge: MIT press, 2000
[3] R. Beira, M. Lopes, M. Praça, J. Santos-Victor, A. Bernardino, G. [23] R. Plutchik, “Emotions and Life: Perspectives from Psychology,
Metta, F. Becchi, R.J. Saltarén, “Design of the Robot-cub (iCub) Biology, and Evolution”, Washington, DC: American Psychological
Head”, in: ICRA94-100, 2006 Association, 2002
[4] J.H. Oh, D. Hanson, W.S. Kim, I.Y. Han, J.Y. Kim, I.W. Park, [24] E. Douglas-Cowie et al., HUMAINE project, “Mid Term Report on
“Design of Android type Humanoid Robot Albert HUBO”, Database Exemplar Progress”. Available: http://emotion-
Proceedings of IROS 2006 research.net/projects/humaine/deliverables/D5g%20final.pdf
[5] H. Miwa, T. Okuchi, H. Takanobu, A. Takanishi, “Development of a [25] G. Trovato et al., “A Cross-Cultural Study on Generation of Culture
new human-like head robot WE-4”, in: IEEE/RSJ International Dependent Facial Expressions of Humanoid Social Robot”, to appear
Conference on Intelligent Robots and Systems: pp. 2443-2448, 2002 in: ICSR, Chengdu, China (October 2012)
308