Using Spatial and Temporal Contrast for
Fluent Robot-Human Hand-overs
Maya Cakmak1 , Siddhartha S. Srinivasa2 , Min Kyung Lee3 , Sara Kiesler3 , Jodi Forlizzi3
1
School of Interactive Computing
Georgia Inst. of Technology
801 Atlantic Dr., Atlanta, GA
maya@cc.gatech.edu
2
3
Intel Labs Pittsburgh
4720 Forbes Ave., Pittsburgh, PA
siddhartha.srinivasa@intel.com
Human Computer Interaction Inst.
Carnegie Mellon University
5000 Forbes Ave., Pittsburgh, PA
{mklee,kiesler,forlizzi}@cs.cmu.edu
ABSTRACT
For robots to get integrated in daily tasks assisting humans,
robot-human interactions will need to reach a level of fluency
close to that of human-human interactions. In this paper we
address the fluency of robot-human hand-overs. From an observational study with our robot HERB, we identify the key
problems with a baseline hand-over action. We find that
the failure to convey the intention of handing over causes
delays in the transfer, while the lack of an intuitive signal to
indicate timing of the hand-over causes early, unsuccessful
attempts to take the object. We propose to address these
problems with the use of spatial contrast, in the form of distinct hand-over poses, and temporal contrast, in the form of
unambiguous transitions to the hand-over pose. We conduct
a survey to identify distinct hand-over poses, and determine
variables of the pose that have most communicative potential for the intent of handing over. We present an experiment
that analyzes the effect of the two types of contrast on the
fluency of hand-overs. We find that temporal contrast is
particularly useful in improving fluency by eliminating early
attempts of the human.
Figure 1: HERB (Home Exploring Robotic Butler)
handing over a drink (a-b) during a public demonstration and (c-d) during our experiment for investigating effects of spatial and temporal contrast.
Categories and Subject Descriptors
I.2.9 [Artificial Intelligence]: Robotics; H.1.2 [Models
and Principles]: User/Machine Systems
General Terms
Design, Experimentation
Keywords
Robot-human hand-overs, fluency
1. INTRODUCTION
Handing over different objects to humans is a key functionality for robots that will assist or cooperate with hu-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
HRI’11, March 6–9, 2011, Lausanne, Switzerland.
Copyright 2011 ACM 978-1-4503-0561-7/11/03 ...$10.00.
mans. A robot could fetch objects for elderly living in their
homes or hand tools to a worker in a factory. While there
are infinite ways that a robot can transfer an object to a
human, including very simple ones, achieving this efficiently
and fluently is a challenge.
Humans carry out seamless hand-overs on a daily basis
with a variety of objects from credit-cards to drinks. Yet it is
often difficult for us to remember these instances or identify
how exactly we hand-over particular objects. This indicates
that hand-overs are automatic and do not require much deliberation for humans. Furthermore, there is a remarkable
coordination in the movements of the giver and the receiver
during a hand-over [28, 15, 14]. This indicates that humans
are good at anticipating the timing of a hand-over from the
way an object is presented, as well as presenting it in a way
that lets the other understand their intent and synchronize
their movements. Our long-term goal is to reach this level
of fluency in hand-overs between humans and robots.
Fluency in human-robot interactions has been studied in
the context of collaborative task execution [13] identifying
several quantitative measures of fluency that correlate with
the human’s sense of fluency. neither the robot nor the human have to wait for one another, resulting in an efficient
execution of the overall task. Furthermore possible inefficiency during the hand-over, such as unpredicted movements
or failed attempts to take the object, must be eliminated to
provide smooth hand-overs and avoid negative influences on
the human’s sense of fluency.
Towards our goal of fluent robot-human hand-overs, we
propose to use contrast in the design of a robot’s poses and
movements for its hand-over interaction. We present two
ways in which the fluency of a hand-over interaction can be
improved. First, we believe humans will be more responsive to the robot if they can easily interpret its intentions.
We propose to achieve this by making the robot’s hand-over
poses distinct from poses that the robot might have during a different action with the object. We refer to this as
spatial contrast. Second, we believe that the coordination
of the hand-over can be improved by making the timing of
the hand-over predictable for the human using an intuitive
signal. We propose using the robot’s movements to signal
the moment of hand-over by transitioning from a pose that
is perceived as non-handing to a pose that is perceived as
handing. We refer to this as temporal contrast.
In this paper we first present an observational study that
led us to the proposed approach. This involves simple handovers of a drink bottle in unconstrained interactions during
an all-day public demonstration of our robot HERB. Second, we present results from a survey that aims at identifying robot poses that are perceived as handing over. Finally,
we present a human robot interaction experiment with 24
subjects, in which we investigate the effects of spatial and
temporal contrast. on the fluency of the hand-over. Our
experiment demonstrates that temporal contrast in particular, can improve the fluency in hand-overs by effectively
communicating the timing of the hand-over and eliminating
early attempts by the human.
2. RELATED WORK
Different aspects of robot-human hand-overs have been
studied within robotics, including motion control and planning [1, 14, 29, 15], grasp planning [21], social interaction [11,
18, 9] and grip forces during hand-over [25, 17]. A few studies involved human subject experiments with hand-overs between a robot and a human [18, 14, 11, 9].
We are particularly interested in how the problem of choosing hand-over poses and trajectories has been addressed in
the literature. One approach is to optimally plan the handover pose and trajectory using an objective function. A
hand-over motion planner that uses safety, visibility and
comfort in the value function is presented in [29]. A handover motion controller that adapts to unexpected arm movements of a simulated human is presented in [1]. A different
approach is to use human evaluation. [18] analyzes human
preferences on the robot’s hand-over behaviors in terms of
the approach direction, height and distance of the object.
User preferences between two velocity profiles for handing
over is analyzed in [14] in terms of participant’s rating of
human-likeness and feeling of safety.
Hand-overs between two humans have also been studied
in the literature, some with an eye towards implications for
robot-human hand-overs [23, 28, 3, 14, 15]. Trajectories and
velocity profiles adopted by humans both in the role of giver
and receiver are analyzed in [28]. Simulation results for a
controller that mimics the characteristics of human handovers are presented in [15]. [14] analyzes the efficiency of
hand-overs in terms of the durations of three phases during a hand-over, and compares human-human hand-overs
with robot-human hand-overs. The social modification of
pick-and-place movements is demonstrated in [3] comparing
velocity profiles for placing an object on a container versus
another person’s palm. [2] analyses human approach and
hand-over and observe a preparatory movements of lifting
the object before the hand-over, which might play an important role in signaling the timing of the hand-over.
We believe that communicating the robot’s intent is crucial to the fluency of hand-overs. Expressing intentions of a
robot has been addressed in the literature using gaze [24],
speech [12], facial expression [27] and body movements [26,
16]. Expressivity has also been addressed in computer animation, mostly within the context of gestures [6, 22].
Our notion of contrast is closely related to exaggeration
in computer animation. This refers to accentuating certain
properties of a scene, including movements, by presenting it
in a wilder, more extreme form [19]. The role of exaggerated movements in communication of intent is supported by
psychological evidence for mothers’ modification of actions
to facilitate infants’ processing, referred to as motionese [5].
3. APPROACH
In this section we describe the framework of our studies,
define fluency and describe our approach for using contrast.
3.1
Platform
Our research platform is HERB (Home Exploring Robot
Platform) (Fig.1) developed at Intel Labs Pittsburgh for personal assistance tasks in home environments [30]. HERB has
two 7-DoF WAM arms, each with a 4-DoF Barrett hand with
three fingers. The WAM arms provide position and torque
sensing on all joints. Additionally their stiffness can be set
to an arbitrary value between 0 (corresponding to maximally
passive by means of actively compensating for gravity) and
1 (corresponding to maximally stiff by means of locking the
joints). The sensing for objects being pulled from HERB’s
hand is based on end effector displacements detected while
the arm has low stiffness. HERB has a mobile Segway base
and is capable of safe, autonomous navigation.
3.2
Hand-over actions for robots
We refer to an action triggered by the robot to satisfy the
goal of transferring an object to a human as a hand-over
action. A hand-over action on HERB is implemented as a
sequence of three phases:
• Approach: The robot navigates towards the receiver with
the object in its hand while its arm is configured in a carrying pose. It stops when it reaches a certain position relative
to the receiver.
• Signal: The robot moves its arm from the carrying pose
to a hand-over pose to signal that it is ready to hand-over.
• Release: The robot waits until it senses the object being pulled and opens its hand to release it. The robot then
moves its arm to a neutral position and closes its hand.
We assume that the object is handed to the robot by someone prior to the hand-over action and that the arm is configured in a carrying pose before starting to approach.
In this framework, variations of the hand-over action are
obtained by changing the carrying and hand-over poses. The
hand-over pose determines the spatial characteristics of the
hand-over since the object is intended to be transferred at
this pose. The transition from the carrying pose to the handover pose determines the temporal characteristics which can
be manipulated by changing the carrying and hand-over
poses. In this study, all trajectories between poses are obtained using the path planning algorithm described in [4].
The speed of the arm during transitions is kept constant.
3.3
Using Contrast to Design hand-overs
We propose using contrast in the poses and the movements
of the robot in order to improve fluency of hand-overs.
• Spatial contrast refers to the distinctness of the pose with
which the object is presented to the person as compared to
other things that the robot might do with an object in its
hand. A hand-over pose with high spatial contrast is a distinct pose that conveys the intent of handing over.
• Temporal contrast refers to the distinctness of the pose
with which the object is presented to the person as compared
to the robot’s previous pose. A transition to the hand-over
pose has high temporal contrast if the carrying pose is distinctly different from the hand-over pose.
4. OBSERVATIONS ON FLUENCY
We first present an observational study on fluency in robothuman hand-overs during the demonstration of our robot
HERB at the Research at Intel day, 2010. In this demonstration the robot’s hand-over action has neither spatial nor
temporal contrast. We present observations that motivate
the need for both types of contrast.
4.1
Description
In this demonstration HERB hands a drink bottle to a
human as part of a drink delivery task (Fig. 1(a-b)). The
robot stands near a table on which drinks are made available.
It starts by grabbing a drink from the table and turns 90o
towards the side where the demonstrator solicits visitors.
Then, it says “Please take the drink” and starts waiting for
a pull on the arm holding the object. This is a simpler
version of the hand-over action described in Sec. 3.2 where
the arm movement signal is replaced with a vocal signal. If
the object is not pulled from the robot’s hand for 10sec, the
robot turns another 90o to drop the drink in a bin.
Before the robot starts the task the visitors are briefed
about the what the robot will be doing. They are told that
the robot plays the role of a bartender and that it can give
them a drink if they want it. If they do ask for a drink they
are told to pull the drink when the robot presents it to them.
4.2
Time-out
Analysis
HERB’s interactions with visitors are recorded from two
different camera views. Hand-over attempts by the robot
Experienced
Early
28
Fluency in hand-overs
A hand-over ideally happens as soon as the robot is ready
to release the object. If the human is not ready to take
the object at that moment the robot will need to wait for
the human. The opposite can also happen. The human can
stop what they are doing in order to take the object while
the robot is not ready to release the object. As a result the
human will need to wait until the robot is ready. A fluent
hand-over minimizes both the robot’s and the human’s waiting durations. This notion of fluency resembles functional
delay defined in [13].
3.4
Table 1: Distribution of HERB’s hand-over attempts during the demonstration. Refer to text for
a description of the categories.
90
15
Novice
Prompt Success
7
7
are separated into four groups: (i) ones in which there is an
error or no visitor is present in the vicinity of the robot, (ii)
ones in which time-out occurs and the drink is dropped in the
bin, (iii) ones in which the experienced demonstrator takes
the drink from the robot and (iv) ones in which the novice
visitor takes the drink. Within the hand-over attempts that
fall into the (ii) and (iii) we look for reasons why the robot
cannot induce a reaction from the visitor. Within the handovers in (iv) we identify (a) the ones in which the visitor
attempts to take the drink too early, (b) the ones in which the
visitor is prompted by the demonstrator to take the object,
and (c) the rest which we label as successful.
4.3
Observations
Table 1 gives the categorization of 147 hand-over attempts.
We make the following observations.
Pose not conveying intent. Even though visitors are
told that HERB will give them a drink, when the drink is
presented several of them did not attempt to take the drink
on their own. Note that in some cases the visitors might not
have heard or understood the robot’s verbal signal as they
were engaged in a conversation. However even when they
direct their attention towards the robot afterwards, they do
not get a sense that the robot is trying to hand them the
drink. Often they take the drink after the demonstrator
prompts them by saying “You can take the drink now” and
pointing to the drink. This indicates that the posture of the
robot does not give the impression that the robot is trying
to hand the object.
Ambiguous boundary between carry and handover. In some cases the receiver is paying close attention to
the robot throughout the execution of the task and attempt
to take the object too early, while the robot is still moving
or before the verbal signal. As the robot turns toward the
person, the object becomes more and more reachable to the
receiver. Before and during the verbal signal, the object is
already at its final hand-over pose. We believe this is the
main cause for the early attempts by the receiver. In addition to affecting fluency by requiring more of the human’s
time, this results in failed attempts to take the object which
may be frustrating for the human.
Overall we observe that the baseline hand-over action has
several issues in terms of fluency. The failure to convey the
intention of hand-over causes delays and time-outs or require
prompting. To overcome this issue we propose using spatial
contrast. The lack of an intuitive signal that indicates when
the robot is ready to hand-over, causes early failed attempts.
To overcome this issue we propose using temporal contrast.
In addition we observe that whether the receiver is paying
attention to the robot or not has important implications on
how fluent the hand-over will be.
Figure 2: (a) Poses used in the survey to identify distinct hand-over poses. Poses are obtained by varying three
features (Arm extension, Hand position, Object tilt). Possible values are 0:Neutral, +:Positive, –:Negative.
(b) Responses by 50 participants for 15 poses. Light colors indicate high frequency of being chosen. The pose
that got chosen more than the others are indicated with squares. (c) Poses that were labelled as handing
more than other choices and the percentage of subjects who labelled this pose as handing.
5. DISTINCT HAND-OVER POSES
To better convey the robot’s intention, we propose using
hand-over poses that are distinct from other things that the
robot might be perceived to be doing when it has an object
in its hand. We turn to the users for identifying such poses,
since the primary objective is recognizability of the intent by
the user. We present results from an online survey aimed at
identifying such poses and investigate which variables of the
pose are most effective in conveying the hand-over intent.
5.1
Survey design
The survey consists of 15 forced-choice questions asking
the participant to categorize a pose of the robot holding a
drink into an action category. The categories are: (i) Holding or carrying the object, (ii) Handing over or giving the
object to someone, (iii) Looking at the object, (iv) Showing
the object to someone and (v) None of the above, something
else. Participants are shown images of the simulated robot
taken from an isometric perspective in each pose. To avoid
context effects the image contains nothing but the robot. To
give a sense of the size of the robot a picture of the robot
next to a person is included in the instructions. The order
of images is randomized for each subject. All questions are
available in one page such that the participant can change
their response for any pose before submitting.
The poses are generated by changing three variables that
we expect will effect the perception of the pose as hand-over.
For each variable we use a neutral, positive and negative
value. These are obtained based on our prediction of how
each variable will affect the communication of the hand-over
intention. These variables and their values are as follows:
• Arm extension: In the neutral pose, the object is about
50cm away from the robot in a comfortable position. In
the positive pose the arm is fully extended and the object is
about 80cm away. In the negative pose the object is about
20cm away.
• Tilt: In the neutral pose, the object is in an upright posi-
tion. In the positive pose the object is tilted away from the
robot by 45o (towards a potential receiver). In the negative
pose the object is tilted towards the robot.
• Grasp location: In the neutral case the robot holds the
object from the side, in the positive case from the back (as
to expose the object to a potential receiver) and in the negative case from the front (as to obstruct the object from a
potential receiver).
The 15 poses consist of the following combinations of property values: 1 pose in which all properties have the neutral
value, 6 poses in which one property has a positive or negative value, 6 poses in which two properties are both positive
or both negative and 2 poses in which all properties are positive or negative. These poses are shown in Fig. 2(a).
5.2
Results
The distribution of choices by 50 participants over the 15
images are shown on Fig. 2(b) indicating the choices that
were preferred more than the others. In all four poses that
were tagged mostly as handing, we observe that the robot’s
arm is extended. A chi-square feature analysis [20] (between
handing versus all the other choices) supports the observation that arm extension is the most important feature for
communicating the hand-over intention, followed by hand
position (χ2 =155.60 for arm extension, χ2 =100.51 for hand
position, χ2 =46.41 for object tilt).
6. EXPERIMENT
We performed an experiment to analyze the effects of spatial and temporal contrast as well as the effect of the receiver’s attentional state on the fluency of hand-overs.
Experimental setup.
In our experiment HERB hands a drink bottle to the subject from the side while they are sitting on a tall chair in
front of a computer screen (Fig. 1(c-d)). The robot starts
facing away from the person and takes the drink bottle from
the experimenter. It configures its arm in the carrying pose,
turns 180o and moves a certain distance towards the person. It then moves to the hand-over pose and waits for a
pull. The object is always presented at the same location
from the right side of the subject. Therefore if the arm is
not extended in the hand-over pose, the robot gets closer
to the person. After the bottle is taken by the subject the
robot moves to a neutral position and goes back to the staring point to deliver the next drink. The grasp of the bottle
is exactly the same in all cases – it is a power grasp at the
bottom of the bottle.
Experimental design.
Our experiment aims at analyzing the effects of using spatial and temporal contrast in designing hand-overs. We consider hand-overs with different combinations of whether or
not each type of contrast exists. This results in four conditions which differ in whether the hand-over pose is distinct
or not (spatial contrast) and whether the transition to the
hand-over pose is distinct or not (temporal contrast). We
refer to the four conditions as follows (Fig. 3): spatial contrast – temporal contrast (CC), spatial contrast – no temporal contrast (CN), no spatial contrast – temporal contrast
(NC), no spatial contrast – no temporal contrast (NN).
Distinct and indistinct hand-over poses are obtained based
on the results of the survey explained in Section 5. In order
to keep the position and orientation in which the object is
presented fixed across conditions we choose hand-over poses
that differ only in arm extension and hand position. As the
distinct hand-over pose, we use a positive arm extension and
hand position (Fig. 2(a)). As an indistinct hand-over pose
we use neutral values for both variables. High temporal
contrast is produced using a distinct non-hand-over pose as
the carrying pose. This carrying pose has negative values for
arm extension and hand position. Low temporal contrast is
produced using a carrying pose in which the end-effector is
moved 10cm towards the robot from the hand-over pose.
In order to account for whether the person is paying attention to the robot during the hand-over we perform our
experiment in two groups. The available group is asked to
pay attention to the robot while it is approaching. The busy
group is asked to perform a task throughout the experiment
such that they do not pay attention to the robot while it is
approaching. To keep the subjects busy we use a continuous
performance task. We use an open source implementation1
of Conner’s continuous performance test [10]. This involves
responding to characters that appear on a black screen by
pressing the space bar on the keyboard, except when the
character is an ‘X’. The frequency with which characters
appear is varied between 1.2 and 1.4sec.
As a result we have a mixed factorial design experiment
with three factors. Spatial and temporal contrast are repeated measure factors, while attentional state of the receiver (available or busy) is a between groups factor. Each
subject carries out a hand-over in the four conditions twice,
resulting in a total of 8 hand-overs per subject. The order
of four conditions is counter balanced across subjects.
Temporal contrast
Carry
Hand-over
No temporal contrast
Hand-over
Carry
CC
CN
NC
NN
Spatial
contrast
No
spatial
contrast
Figure 3: Four conditions for testing spatial and temporal contrast.
how much to pull the object. During these trials the robot
says “Please take the object” to indicate when it is ready
to hand-over. The subject is told that during the experiment the robot will not use this verbal signal so they need
to decide when to take the object. Subjects in both groups
are told to take the object as soon as possible. Subjects are
asked to use their right hand while taking the object from
the robot. Subjects in the busy group are told to use their
right hand also for pressing the space bar and to not use
their left hand to press the space bar at any time.
Evaluation.
We evaluate hand-overs in different groups and conditions
in terms of their fluency. Timing of two events are determined from video recordings of the interactions: the moment
their hand starts moving to take the object from the robot
(tmove ) and the moment they contact the object (ttouch ).
Other timing information is obtained from the logs of the
robot’s internal state: the moment the robot starts moving its arm towards from the carry pose to the hand-over
pose (tsignal ), the moment that the robot starts waiting for
the pull (tready ), and the moment that the person takes the
object (ttransfer ). Our main measures of fluency are the waiting durations by the robot (ttransfer –tready ) and the human
(ttransfer –ttouch ). In our analysis we use the second 4 interactions out of the 8 in order to exclude the effects of unfamiliarity in the very first interaction. In addition subjects are
given an exit survey including the question: Did you notice
any difference in the way that HERB presented the object to
you? Please explain.
Hypotheses.
We expect that the intention of handing over can be communicated better with distinct hand-over poses and reduce
the time that the robot waits for the person to take the object. We also hypothesize that by using temporal contrast
the intended moment of transfer can be communicated better and reduce the time that the person waits for the robot
to give the object and avoid unsuccessful attempts.
7. RESULTS
Procedure.
Prior to the experiment subjects are given some experience of taking the object from the robot such that they know
1
http://pebl.sourceforge.net/battery.html
Our experiment was completed by 24 subjects (9 female,
15 male, between the ages of 20-45). Subjects were equally
assigned to available and busy groups. The average robot
and human waiting times for each condition individually,
and collapsed for each factor are given in Fig. 4. We per-
sec
(a)
8
7
6
5
4
3
2
1
0
Robot wait time
Available
Table 2: Results of the mixed factor three-way
ANOVA for robot and human waiting durations.
The three factors are attention (A), spatial contrast
(SC) and temporal contrast (TC).
Busy
Robot wait time
CC
CN
NC
NN
CC
CN
Spatial Contrast
Attention
NC
NN
Temporal Contrast
6
5
sec
4
3
2
A
SC
TC
A×SC
A×TC
SC×TC
F(1,22)=3.24,
F(1,22)=0.97,
F(1,22)=0.82,
F(1,22)=0.03,
F(1,22)=0.14,
F(1,22)=0.13,
p>.05
p>.05
p>.05
p>.05
p>.05
p>.05
Human wait time
F(1,22)=14.55, p<.005*
F(1,22)=1.16, p>.05
F(1,22)=9.05, p<.005*
F(1,22)=0.59, p>.05
F(1,22)=0.05, p>.05
F(1,22)=1.54, p>.05
1
0
Available
Contrast No contrast
Busy
Contrast No contrast
sec
(b) Human wait time
8
7
6
5
4
3
2
1
0
Busy
Available
CC
CN
Attention
NC
NN
CC
Spatial Contrast
CN
NC
NN
Temporal Contrast
6
5
sec
4
3
2
1
0
Available
Busy
Contrast No contrast
Contrast No contrast
Figure 4: Average robot and human waiting times
for each of the 8 conditions (2×2×2) and collapsed
into two groups for each factor.
form a mixed factor three-way ANOVA on robot and human
waiting durations with two repeated measure factors (temporal and spatial contrast in hand-overs) and one between
subjects factor (attention to robot) [?]. The results are given
in Table 2. We find supporting evidence for our hypothesis
about temporal contrast; whereas our hypothesis about spatial contrast cannot be supported by our experiment. These
results are summarized as follows.
Effect of temporal contrast.
We find that temporal contrast significantly reduces the
waiting time of the human (Fig. 4(b), Table 2).2 This means
that temporal contrast lets the receivers correctly time their
attempt to take the object and avoid early attempts.
Waiting duration of the human is highest for the CN condition. We observe that 9 subjects in this condition attempted to take the object too early. In the available group
6 subjects tried to take the object while the robot was navigating towards the person. They kept holding the object
until they obtained it. Snapshots from two such incidents
2
Same statistical results are obtained using ttransfer –tmove as
the measure of human waiting time.
are given in Fig. 5. In the busy group 3 subjects moved
their hand to touch the object, went back to the attention
task after they realize they cannot take it, and tried again
later when the robot stopped moving. One subject in the
busy group describes this in the survey saying the he tried
to take the object when “the drink appeared in [his] peripheral vision, but HERB was not yet ready to hand over [so
he] gave up to go press space again”. The same problem
was observed on 3 subjects in the NN condition (all in the
available group) and never observed on conditions with temporal contrast. These instances further motivate the benefit
of temporal contrast.
The timeline of events for a subject in the available group
is illustrated in Fig. 6. The subject starts moving her hand
before the robot’s arm starts moving in both conditions with
no temporal contrast (CN, NN). In the CC condition the
subject’s hand moves towards the bottle after the robot’s
arm stops moving. The NC condition demonstrates an instance where the person adapts movement speed as to reach
the object around the time that the robot’s arm stops moving. This indicates that temporal contrast might be helpful
in letting the human anticipate the point of hand-over.
The time wasted by the subjects in conditions with no
temporal contrast (CN, NN) is reflected in their performance
on the attention task in the busy group. We see that subjects miss an average of 2.54 (SD=1.32) stimuli in conditions
with temporal contrast, while they miss an average of 3.05
(SD=1.41) stimuli in conditions with no temporal contrast.
These observations also demonstrate the issues related to
carrying the object in a pose that is perceived as handing.
Although the interaction between temporal and spatial contrast is not significant (Table 2), we see that the CN condition is more problematic than the NN group due to the carrying pose. In other words, spatial contrast in the absence
of temporal contrast might be harmful to the interaction.
Effect of spatial contrast.
There was no significant effect of spatial contrast on robot
waiting time. Our hypothesis was that spatial contrast would
help the robot communicate its intention of handing the object and reduce the waiting time of the robot. We believe
that our experiment was not suited for testing this hypothesis as the subjects were explicitly instructed to take object
from the robot and the robot was not doing anything other
than delivering the drink. Thus subjects did not need to
distinguish the robot’s handing intention from other inten-
CN
ttransfer -5
NN
tmove ttouch
No
temporal
contrast
Temporal
contrast
t=21.42
t=19.88
CN
tsignal
tready
ttransfer
CN
Robot
Human
NN
Robot
Human
CC
Robot
Human
NC
Robot
Human
NN
Figure 6: Sample timeline of events in four conditions from a subject in the available group.
t=25.65
t=26.10
Figure 5: Two examples of early attempts by a subject in the available group, in CN and NN conditions.
tions. We believe that a setting where the person does not
expect the hand-over and the robot is doing multiple actions
will be more suitable for testing this hypothesis. Note that
temporal contrast might also help reduce robot waiting time
by functioning as an attention grabber in situations where
the person is busy.
While our hypothesis on reducing robot waiting duration
is not supported by our experiment, we believe there is evidence that spatial contrast served its goal of communicating
the intention of handing over an object. We see that when
the robot was approaching the person with an extended arm
in the CN condition, several subjects made early attempts
to take the object. Even though at that point subjects have
had experience with the hand-over action, the extended arm
of the robot induced a reaction from the human to take the
object. This shows that the extended arm during the approach communicated a handing intention even though there
was no signal from the robot to hand the object.
Note that the robot waiting time for the CN condition is
relatively high. As this is the condition in which the human
waiting time is highest, one would expect that the robot
waiting time will be lower. However we observe two behaviors that result in the contrary. In some cases the subjects
fail to obtain the object when they pull so they stop pulling,
however keep holding the object and move along with the
robot (Fig. 5). Only after the robot stops they attempt to
pull again. In other cases, the subject unsuccessfully attempts to take the object and give up. To avoid another
failed attempt they make sure to wait a sufficient amount of
time, thus overcompensate for the failed attempt.
While describing the differences between the hand-overs
in the survey, 5 subjects stated preference for either or both
temporal and spatial contrast. One of them explained that
“[he] liked it when HERB held the bottle close to itself and
not with an outstretched arm while moving [and that this]
helped [him] figure out when it was in the process of handing
the bottle and when it was time for [him] to grab the bottle”.
Another subject said that “[she] preferred when HERB was
further away when it finished driving and started to move
the arm, because when it moved closer [she] got worried that
it was going to continue to drive into [her] or when it moved
its arm that it would hit [her]”. This shows that temporal
and spatial contrast is not only desirable for fluency but
might also be preferred by users and make them feel safer.
Effect of attention.
We find that the waiting time for the receiver is smaller
when the subjects are performing the attention test. This
is not surprising as these subjects are mostly not looking at
the robot while it is approaching or while its arm is moving.
4 subjects in the busy group performed more than half of
the hand-overs without turning their head away from the
computer screen. Even though they are told to take the
object as soon as possible, they often wait for the robot to
come to a complete stop before they attempt to take the
object. Consequently they get the object immediately when
they try to take it and they do not need to wait. A side effect
of this is the noticeable, but not significant, increase in the
robot waiting duration when the subject is busy (Fig. 4(a)).
All subjects in the available group reported in the survey
that they noticed a difference in the way HERB presented
the objects. Their description of the differences referred to
both types of contrast. In the busy group only half of the
subjects noticed a difference in the way the object was presented. Their description of the difference was often limited
to the distance of the robot being different.
There is no significant interaction between attention and
temporal contrast (Table 2). The waiting time is higher for
conditions with no temporal contrast whether the subject is
available or busy. While early attempts occurred less in the
busy group, the average waiting time of the human was also
smaller for all groups. As a result the difference is preserved.
8. CONCLUSIONS
This paper is motivated from observations of unconstrained
hand-over interactions between novice humans and our robot
HERB during drink deliveries. We see that novices either
do not recognize the robot’s attempt to hand them a drink,
or they attempt to take the drink too early. To address
these issues we propose using contrast in the robot’s actions.
By making the robot’s hand-over pose distinct from other
things that the robot might do with an object in its hand,
the intent of the robot can be conveyed better (spatial contrast). By transitioning to the hand-over pose from a pose
that is clearly non-handing, the timing of the hand-over can
be communicated better (temporal contrast). We present
results from a survey that aims to identify poses that are
perceived as handing over. We find that all three features
we proposed were useful in conveying the hand-over intention, while arm extension was the most effective. These
findings can guide the design of hand-over poses for a range
of different robots and objects.
Finally we present an experiment that investigates the effects of spatial and temporal contrast. We find that temporal contrast improves the fluency of hand-overs by letting the
human synchronize their taking attempts and by eliminating
early failed attempts. This finding suggest that robots can
greatly benefit from concealing the object from the receiver
while carrying it and by transitioning to the hand-over pose
when they are ready to release the object. While we don’t
see an effect of spatial contrast in this experiment, we believe
that a different setup can capture the usefulness of spatial
contrast. We plan to explore this hypothesis further in the
next public demonstration of our robot as well as with an
experiment that emphasizes recognition of intent.
Acknowledgments
This work is partially supported by NSF under Grant No.
EEC-0540865. M. Cakmak was partially supported by the
CMU-Intel Summer Fellowship. Special thanks to the members of the Personal Robotics Lab at Intel Pittsburgh for
insightful comments and discussions.
9. REFERENCES
[1] A. Agah and K. Tanie. Human interaction with a
service robot: Mobile-manipulator handing over an
object to a human. In Proc. of ICRA, 575–580, 1997.
[2] P. Basili, M. Huber, T. Brandt, S. Hirche, and
S. Glasauer. Investigating human-human approach
and hand-over. In Human Centered Robot Systems:
Cognition, Interaction, Technology, 151–160, 2009.
[3] C. Becchio, L. Sartori, and U. Castiello. Toward you:
The social side of actions. Current Directions in
Psychological Science, 19(3):183–188, 2010.
[4] D. Berenson, S. Srinivasa, D. Ferguson, and
J.J. Kuffner Manipulation planning on constraint
manifolds. In Proc. of ICRA, 1383–1390, 2009.
[5] R. Brand, D. Baldwin, and L. Ashburn. Evidence for
‘motionese’: modifications in mothers’ infant-directed
action. Developmental Science, 5:72–83, 2002.
[6] J. Cassell, H. H. Vilhjalmsson, and T. Bickmore. Beat:
the behavior expression animation toolkit. In Proc. of
SIGGRAPH, 477–486, 2001.
[7] B.H. Cohen. Explaining Psychological Statistics. 2.
New York: John Wiley & Sons, 2001.
[8] D. Chi, M. Costa, L. Zhao, and N. Badler. The emote
model for effort and shape. In Proc. of SIGGRAPH,
173–182, 2000.
[9] Y.S. Choi, T.L. Chen, A. Jain, C. Anderson,
J.D. Glass, and C.C. Kemp. Hand it over or set it
down: A user study of object delivery with an assistive
mobile manipulator. In Proc. of RO-MAN, 2009.
[10] C. Conners. Conners continuous performance test.
Multi-Health Systems, 1995.
[11] A. Edsinger and C. Kemp. Human-robot interaction
for cooperative manipulation: Handing objects to one
another. In Proc. of RO-MAN, 2007.
[12] E. Jee, Y. Jeong, C.H. Kim and H. Kobayashi. Sound
design for emotion and intention expression of socially
interactive robots. Intelligent Service Robotics,
3(3):199–206, 2010.
[13] G. Hoffman and C. Breazeal. Cost-based anticipatory
action selection for human-robot fluency. IEEE
Transactions on Robotics, 23(5):952–961, 2007.
[14] M. Huber, M. Rickert, A. Knoll, T. Brandt, and
S. Glasauer. Human-robot interaction in handing-over
tasks. In Proc. of RO-MAN, 107–112, 2008.
[15] S. Kajikawa, T. Okino, K. Ohba, and H. Inooka.
Motion planning for hand-over between human and
robot. In Proc. of IROS, 193–199, 1995.
[16] T. Kanda, H. Ishiguro, M. Imai, and T. Ono. Body
movement analysis of human-robot interaction. In
Proc. of IJCAI, 177–182, 2003.
[17] I. Kim and H. Inooka. Hand-over of an object between
human and robot. In Proc. of RO-MAN, 1992.
[18] K. Koay, E. Sisbot, D. Syrdal, M. Walters,
K. Dautenhahn, and R. Alami. Exploratory study of a
robot approaching a person in the context of handing
over an object. In Proc. of AAAI-SS on
Multi-disciplinary Collaboration for Socially Assistive
Robotics, 18–24, 2007.
[19] J. Lasseter. Principles of traditional animation applied
to 3d computer animation. SIGGRAPH Comput.
Graph., 21(4):35–44, 1987.
[20] H. Liu. and R. Setiono. Chi2: Feature selection and
discretization of numeric attributes. In Proc. IEEE
Intl. Conf. on Tools with AI, 338–391, 1995.
[21] E. Lopez-Damian, D. Sidobre, S. DeLaTour, and
R. Alami. Grasp planning for interactive object
manipulation. In Proc. of the Intl. Symp. on Robotics
and Automation, 2006.
[22] M. Mancini and G. Castellano. Real-time analysis and
synthesis of emotional gesture expressivity. In Proc. of
the Doctoral Consortium of Intl. Conf. on Affective
Computing and Intelligent Interaction, 2007.
[23] A. Mason and C. MacKenzie. Grip forces when
passing an object to a partner. Experimental Brain
Research, 163:173–187, 2005.
[24] B. Mutlu, F. Yamaoka, T. Kanda, H. Ishiguro, and
N. Hagita. Nonverbal leakage in robots:
communication of intentions through seemingly
unintentional behavior. In Proc. of HRI, 69–76, 2009.
[25] K. Nagata, Y. Oosaki, M. Kakikura, and H. Tsukune.
Delivery by hand between human and robot based on
fingertip force-torque information. In Proc. of IROS,
750–757, 1998.
[26] T. Nakata, T. Sato, and T. Mori. Expression of
emotion and intention by robot body movement. In
Proc. of the Intl. Conf. on Autonomous Systems, 1998.
[27] J. Schulte, C. Rosenberg, and S. Thrun. Spontaneous,
short-term interaction with mobile robots in public
places. In Proc. of ICRA, 1999.
[28] S. Shibata, K. Tanaka and A. Shimizu. Experimental
analysis of handing over. In Proc. of RO-MAN, 53–58,
1995.
[29] E. Sisbot, L. Marin, and R. Alami. Spatial reasoning
for human robot interaction. In Proc. of IROS, 2007.
[30] S. Srinivasa, D. Ferguson, C. Helfrich., D. Berenson,
A. Collet, R. Diankov, G. Gallagher, G. Hollinger,
J. Kuffner, and M. Weghe. Herb: A home exploring
robotic butler. Autonomous Robots, 2009.