We present an Embodied Conversational Agent (ECA) that incorporates a context-sensitive mechanism for handling user barge-in. The affective ECA engages the user in social conversation, and is fully implemented. We will use actual examples of system behaviour to illustrate. The ECA is designed to recognise and be empathetic to the emotional state of the user. It is able to detect, react quickly to, and then follow up with considered responses to different kinds of user interruptions. The design of the rules which enable the ECA to respond intelligently to different types of interruptions was informed by manually analysed real data from human–human dialogue. The rules represent recoveries from interruptions as two-part structures: an address followed by a resumption. The system is robust enough to manage long, multi-utterance turns by both user and system, which creates good opportunities for the user to interrupt while the ECA is speaking.
The graphical user interface for the HWYD? prototype including the animated agent were created by Telefonica ID, Madrid.
All inter-component messages sent and received during run-time are recorded in a log file.
The user’s turns are reproduced exactly as was output by the ASR, and therefore include some ungrammatical structures.
Recall that we equate an interruption with overlapping/simultaneous speech (Sect. 2).
The domain of the corpus we used was different from that of the HWYD? system, whose domain was social dialogue focusing on one’s day at the office. There was no opportunity at this late stage for us to collect a corpus in that domain, and no corpus available in this domain that was also full of interruptions.
20070112, 20070119, 20070126.
Owing to the preliminary nature of the work, no inter-annotator agreement is yet in force.
Ignoring turn content (ign-t-content) was the second most frequent way of recovering from an interruption that we observed in the corpus, the most frequent being to supply information that was requested by the interruption (at 30.65%), which we also modelled (see Sect. 5.2.3).
The system did not recognise Turn (b) as a WH question because it did not begin with a WH word. This is one of many undesirable shortcomings that are currently being addressed.
Additionally, we take note that many interruptions are considered by much of the interruptions literature to be hostile, in that the interruptor snatches the conversational floor before it is his turn. By implication it is reasonable to posit that if a user interrupts the system, the system may have said something that has not been well received by the user, and this adds weight to the appropriateness of an apology in an interruption recovery.
This work was partially funded by the COMPANIONS project (http://www.companions-project.org) sponsored by the European Commission (EC) as part of the Information Society Technologies (IST) programme under EC grant number IST-FP6-034434. We thank the University of Augsburg (Prof. Elisabeth André) for supplying a version of the EmoVoice [25] system. Other contributors to the prototype described in this paper are Ramon Granell, Simon Dobnik, Karo Moilanen and Manjari Chandran-Ramesh (University of Oxford), Raúl Santos de la Cámara (Telefonica ID, Madrid), Markku Turunen (University of Tampere) and Enrico Zovato (Loquendo, Torino)
Appendix: Interruption and Recovery Types
