This paper explores the role of gaze in coordinating turn-taking in mixed-initiative conversation and specifically how gaze indicators might be usefully modeled in computational dialogue systems. We analyzed about 20 minutes of videotape... more
This paper explores the role of gaze in coordinating turn-taking in mixed-initiative conversation and specifically how gaze indicators might be usefully modeled in computational dialogue systems. We analyzed about 20 minutes of videotape of eight dialogues by four pairs of ...
The paper presents a study on question answering systems evaluation. The purpose of the study is to determine if human evaluation is indeed necessary to qualitatively measure the performance of a sociomedical dialogue system. The study is... more
The paper presents a study on question answering systems evaluation. The purpose of the study is to determine if human evaluation is indeed necessary to qualitatively measure the performance of a sociomedical dialogue system. The study is based on the data from several natural language processing experiments conducted with a question answering dataset for inclusion of people with autism spectrum disorder and state-ofthe-art models with the Transformer architecture. The study describes model-centric experiments on generative and extractive question answering and data-centric experiments on dataset tuning. The purpose of both model-and data-centric approaches is to reach the highest F1-Score. Although F1-Score and Exact Match are well-known automated evaluation metrics for question answering, their reliability in measuring the performance of sociomedical systems, in which outputs should be not only consistent but also psychologically safe, is questionable. Considering this idea, the author of the paper experimented with human evaluation of a dialogue system for inclusion developed in the previous phase of the work. The result of the study is the analysis of the advantages and disadvantages of automated and human approaches to evaluate conversational artificial intelligence systems, in which the psychological safety of a user is essential.
This report is about the state of the art in dialogue management. We first introduce an overview of a multimodal dialogue system and its components. Second, four main approaches to dialogue management are described (finite-state and... more
This report is about the state of the art in dialogue management. We first introduce an overview of a multimodal dialogue system and its components. Second, four main approaches to dialogue management are described (finite-state and frame-based, information-state based and probabilistic, plan-based, and collaborative agent-based approaches). Finally, the dialogue management in the recent dialogue systems is presented.
This paper describes a substantial effort to build a real-time interactive multimodal dialogue system with a focus on emotional and nonverbal interaction capabilities. The work is motivated by the aim to provide technology with... more
This paper describes a substantial effort to build a real-time interactive multimodal dialogue system with a focus on emotional and nonverbal interaction capabilities. The work is motivated by the aim to provide technology with competences in perceiving and producing the emotional and nonverbal behaviors required to sustain a conversational dialogue. We present the Sensitive Artificial Listener (SAL) scenario as a setting which seems particularly suited for the study of emotional and nonverbal behavior since it requires only very limited ...
This paper describes a substantial effort to build a real-time interactive multimodal dialogue system with a focus on emotional and nonverbal interaction capabilities. The work is motivated by the aim to provide technology with... more
This paper describes a substantial effort to build a real-time interactive multimodal dialogue system with a focus on emotional and nonverbal interaction capabilities. The work is motivated by the aim to provide technology with competences in perceiving and producing the emotional and nonverbal behaviors required to sustain a conversational dialogue. We present the Sensitive Artificial Listener (SAL) scenario as a setting which seems particularly suited for the study of emotional and nonverbal behavior since it requires only very limited verbal understanding on the part of the machine. This scenario allows us to concentrate on nonverbal capabilities without having to address at the same time the challenges of spoken language understanding, task modeling, etc. We first report on three prototype versions of the SAL scenario in which the behavior of the Sensitive Artificial Listener characters was determined by a human operator. These prototypes served the purpose of verifying the effectiveness of the SAL scenario and allowed us to collect data required for building system components for analyzing and synthesizing the respective behaviors. We then describe the fully autonomous integrated real-time system we created, which combines incremental analysis of user behavior, dialogue management, and synthesis of speaker and listener behavior of a SAL character displayed as a virtual agent. We discuss principles that should underlie the evaluation of SAL-type systems. Since the system is designed for modularity and reuse and since it is publicly available, the SAL system has potential as a joint research tool in the affective computing research community.
In this paper we describe a dialogue system which makes use of the notion of presupposi- tions to provide a unifying perspective on the function of a dialogue manager. We extend van der Sandt's (1992) treatment of presup- positions in... more
In this paper we describe a dialogue system which makes use of the notion of presupposi- tions to provide a unifying perspective on the function of a dialogue manager. We extend van der Sandt's (1992) treatment of presup- positions in DRT to allow presuppositions to support question-answering, the generation of targeted follow-up questions, the identi- cation of dialogue acts and the semantic dis- ambiguation of questions.
日本電信電話株式会社NTTメディアインテリジェンス研究所京都大学電気通信大学(株)NTTドコモ(株)富士通研究所東北大学 / 理化学研究所国立国語研究所国立国語研究所日本電信電話株式会社NTTコミュニケーション科学基礎研究所NTT Media Intelligence Laboratories, NTT CorporationKyoto UniversityThe University of Electro-CommunicationsNTT DOCOMO... more
日本電信電話株式会社NTTメディアインテリジェンス研究所京都大学電気通信大学(株)NTTドコモ(株)富士通研究所東北大学 / 理化学研究所国立国語研究所国立国語研究所日本電信電話株式会社NTTコミュニケーション科学基礎研究所NTT Media Intelligence Laboratories, NTT CorporationKyoto UniversityThe University of Electro-CommunicationsNTT DOCOMO INC.Fujitsu Laboratories, LTD.Tohoku University / RIKEN AIPNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNTT Communication Science Laboratorie
In this paper, we describe the implemented service robot, called FusionBot. The goal of this research is to explore and demonstrate the utility of an interactive service robot in a smart home environment, thereby improving the quality of... more
In this paper, we describe the implemented service robot, called FusionBot. The goal of this research is to explore and demonstrate the utility of an interactive service robot in a smart home environment, thereby improving the quality of human life. The robot has four main features: 1) speech recognition, 2) object recognition, 3) object grabbing and fetching and 4) communication with a smart coffee machine. Its software architecture employs a multimodal dialogue system that integrates different components, including spoken dialog system, vision understanding, navigation and smart device gateway. In the experiments conducted during the TechFest 2008 event, the FusionBot successfully demonstrated that it could autonomously serve coffee to visitors on their request. Preliminary survey results indicate that the robot has potential to not only aid in the general robotics but also contribute towards the long term goal of intelligent service robotics in smart home environment.
We propose a method for learning dialogue management policies from a fixed dataset. The method is designed for use with" Information State Update"(ISU)-based dialogue systems, which represent the state of a dialogue as a large... more
We propose a method for learning dialogue management policies from a fixed dataset. The method is designed for use with" Information State Update"(ISU)-based dialogue systems, which represent the state of a dialogue as a large set of features, resulting in a very large state space and a very large policy space. To address the problem that any fixed dataset will only provide information about small portions of these state and policy spaces, we propose a hybrid model which combines reinforcement learning (RL) with ...
In this paper we describe a system we have developed for automatic broadcast-quality video indexing that successfully combines results from the fields of speaker verification, acoustic analysis, very large vocabulary speech recognition,... more
In this paper we describe a system we have developed for automatic broadcast-quality video indexing that successfully combines results from the fields of speaker verification, acoustic analysis, very large vocabulary speech recognition, content based sampling of video, information retrieval, natural language processing, dialogue systems, and MPEG2 delivery over IP. Our audio classification and anchorperson detection (in the case of news material) classifies video into news versus commercials using acoustic features and can reach 97% accuracy on our test data set. The processing includes very large vocabulary speech recognition (over 230K-word vocabulary) for synchronizing the closed caption stream with the audio stream. Broadcast news corpora are used to generate language models and acoustic models for speaker identification. Compared with conventional discourse segmentation algorithms based on only text information, our integrated method operates more efficiently with more accurate...
This paper overviews recent progress in the development of corpus-based spontaneous speech recognition technology focusing on various achievements of a Japanese 5-year national project "Spontaneous Speech: Corpus and Processing... more
This paper overviews recent progress in the development of corpus-based spontaneous speech recognition technology focusing on various achievements of a Japanese 5-year national project "Spontaneous Speech: Corpus and Processing Technology". Although speech is in almost any situation spontaneous, recognition of spontaneous speech is an area which has only recently emerged in the field of automatic speech recognition. Broadening the application
This paper presents the architecture and implementation for an easily extendable gesture library to be used by a dialogue system. Main focus has been on creating and implementing gestures to signal turn taking etc. in order to facilitate... more
This paper presents the architecture and implementation for an easily extendable gesture library to be used by a dialogue system. Main focus has been on creating and implementing gestures to signal turn taking etc. in order to facilitate for users to interact with a dialogue system. The idea was to make possible for the dialogue system to randomly select among a given set of gestures, depending what dialogue state it is in, and in that way make it more natural and non-repetative.
Within the EC-funded project SQEL, the German EVAR spoken dialogue system has been extended with respect to multilinguality and multifunctionality. The current demonstrator can handle four different languages and domains: German, Slovak,... more
Within the EC-funded project SQEL, the German EVAR spoken dialogue system has been extended with respect to multilinguality and multifunctionality. The current demonstrator can handle four different languages and domains: German, Slovak, and Czech (and their national train connections), and Slovenian (European flights). The SQEL demonstrator can also access databases on the WWW, which enables users without an internet connection to meet their information needs by just using the phone. The system starts up with a German opening phrase and the user is free to use any of the implemented languages. Amultilingual word recognizer implicitly identifies the language, which is then associated with the appropriate domain and database. For the remainder of the dialogue, the corresponding monolingual recognizer is used instead. Experiments to date have shown that the multilingual and the (respective) monolingual recognizers attain comparable word accuracy rates, although the former is less effi...
We present a real-time multi-modal dialogue system for conversations with intelligent autonomous systems in this case a robot helicopter, or UAV ('Unmanned Aerial Vehicle') [1]. The system operates over a dynamic environment,... more
We present a real-time multi-modal dialogue system for conversations with intelligent autonomous systems in this case a robot helicopter, or UAV ('Unmanned Aerial Vehicle') [1]. The system operates over a dynamic environment, which supercedes the standard travel-planning ...