-
Controllable Generation of Dialogue Acts for Dialogue Systems via Few-Shot Response Generation and Ranking
Authors:
Angela Ramirez,
Karik Agarwal,
Juraj Juraska,
Utkarsh Garg,
Marilyn A. Walker
Abstract:
Dialogue systems need to produce responses that realize multiple types of dialogue acts (DAs) with high semantic fidelity. In the past, natural language generators (NLGs) for dialogue were trained on large parallel corpora that map from a domain-specific DA and its semantic attributes to an output utterance. Recent work shows that pretrained language models (LLMs) offer new possibilities for contr…
▽ More
Dialogue systems need to produce responses that realize multiple types of dialogue acts (DAs) with high semantic fidelity. In the past, natural language generators (NLGs) for dialogue were trained on large parallel corpora that map from a domain-specific DA and its semantic attributes to an output utterance. Recent work shows that pretrained language models (LLMs) offer new possibilities for controllable NLG using prompt-based learning. Here we develop a novel few-shot overgenerate-and-rank approach that achieves the controlled generation of DAs. We compare eight few-shot prompt styles that include a novel method of generating from textual pseudo-references using a textual style transfer approach. We develop six automatic ranking functions that identify outputs with both the correct DA and high semantic accuracy at generation time. We test our approach on three domains and four LLMs. To our knowledge, this is the first work on NLG for dialogue that automatically ranks outputs using both DA and attribute accuracy. For completeness, we compare our results to fine-tuned few-shot models trained with 5 to 100 instances per DA. Our results show that several prompt settings achieve perfect DA accuracy, and near perfect semantic accuracy (99.81%) and perform better than few-shot fine-tuning.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
A Deep Ensemble Model with Slot Alignment for Sequence-to-Sequence Natural Language Generation
Authors:
Juraj Juraska,
Panagiotis Karagiannis,
Kevin K. Bowden,
Marilyn A. Walker
Abstract:
Natural language generation lies at the core of generative dialogue systems and conversational agents. We describe an ensemble neural language generator, and present several novel methods for data representation and augmentation that yield improved results in our model. We test the model on three datasets in the restaurant, TV and laptop domains, and report both objective and subjective evaluation…
▽ More
Natural language generation lies at the core of generative dialogue systems and conversational agents. We describe an ensemble neural language generator, and present several novel methods for data representation and augmentation that yield improved results in our model. We test the model on three datasets in the restaurant, TV and laptop domains, and report both objective and subjective evaluations of our best model. Using a range of automatic metrics, as well as human evaluators, we show that our approach achieves better results than state-of-the-art models on the same datasets.
△ Less
Submitted 16 May, 2018;
originally announced May 2018.
-
Exploring Conversational Language Generation for Rich Content about Hotels
Authors:
Marilyn A. Walker,
Albry Smither,
Shereen Oraby,
Vrindavan Harrison,
Hadar Shemtov
Abstract:
Dialogue systems for hotel and tourist information have typically simplified the richness of the domain, focusing system utterances on only a few selected attributes such as price, location and type of rooms. However, much more content is typically available for hotels, often as many as 50 distinct instantiated attributes for an individual entity. New methods are needed to use this content to gene…
▽ More
Dialogue systems for hotel and tourist information have typically simplified the richness of the domain, focusing system utterances on only a few selected attributes such as price, location and type of rooms. However, much more content is typically available for hotels, often as many as 50 distinct instantiated attributes for an individual entity. New methods are needed to use this content to generate natural dialogues for hotel information, and in general for any domain with such rich complex content. We describe three experiments aimed at collecting data that can inform an NLG for hotels dialogues, and show, not surprisingly, that the sentences in the original written hotel descriptions provided on webpages for each hotel are stylistically not a very good match for conversational interaction. We quantify the stylistic features that characterize the differences between the original textual data and the collected dialogic data. We plan to use these in stylistic models for generation, and for scoring retrieved utterances for use in hotel dialogues
△ Less
Submitted 1 May, 2018;
originally announced May 2018.
-
PlaTIBART: a Platform for Transactive IoT Blockchain Applications with Repeatable Testing
Authors:
Michael A. Walker,
Abhishek Dubey,
Aron Laszka,
Douglas C. Schmidt
Abstract:
With the advent of blockchain-enabled IoT applications, there is an increased need for related software patterns, middleware concepts, and testing practices to ensure adequate quality and productivity. IoT and blockchain each provide different design goals, concepts, and practices that must be integrated, including the distributed actor model and fault tolerance from IoT and transactive informatio…
▽ More
With the advent of blockchain-enabled IoT applications, there is an increased need for related software patterns, middleware concepts, and testing practices to ensure adequate quality and productivity. IoT and blockchain each provide different design goals, concepts, and practices that must be integrated, including the distributed actor model and fault tolerance from IoT and transactive information integrity over untrustworthy sources from blockchain. Both IoT and blockchain are emerging technologies and both lack codified patterns and practices for development of applications when combined. This paper describes PlaTIBART, which is a platform for transactive IoT blockchain applications with repeatable testing that combines the Actor pattern (which is a commonly used model of computation in IoT) together with a custom Domain Specific Language (DSL) and test network management tools. We show how PlaTIBART has been applied to develop, test, and analyze fault-tolerant IoT blockchain applications.
△ Less
Submitted 29 September, 2017; v1 submitted 27 September, 2017;
originally announced September 2017.
-
Measuring the Similarity of Sentential Arguments in Dialog
Authors:
Amita Misra,
Brian Ecker,
Marilyn A. Walker
Abstract:
When people converse about social or political topics, similar arguments are often paraphrased by different speakers, across many different conversations. Debate websites produce curated summaries of arguments on such topics; these summaries typically consist of lists of sentences that represent frequently paraphrased propositions, or labels capturing the essence of one particular aspect of an arg…
▽ More
When people converse about social or political topics, similar arguments are often paraphrased by different speakers, across many different conversations. Debate websites produce curated summaries of arguments on such topics; these summaries typically consist of lists of sentences that represent frequently paraphrased propositions, or labels capturing the essence of one particular aspect of an argument, e.g. Morality or Second Amendment. We call these frequently paraphrased propositions ARGUMENT FACETS. Like these curated sites, our goal is to induce and identify argument facets across multiple conversations, and produce summaries. However, we aim to do this automatically. We frame the problem as consisting of two steps: we first extract sentences that express an argument from raw social media dialogs, and then rank the extracted arguments in terms of their similarity to one another. Sets of similar arguments are used to represent argument facets. We show here that we can predict ARGUMENT FACET SIMILARITY with a correlation averaging 0.63 compared to a human topline averaging 0.68 over three debate topics, easily beating several reasonable baselines.
△ Less
Submitted 6 September, 2017;
originally announced September 2017.
-
Storytelling Agents with Personality and Adaptivity
Authors:
Zhichao Hu,
Marilyn A. Walker,
Michael Neff,
Jean E. Fox Tree
Abstract:
We explore the expression of personality and adaptivity through the gestures of virtual agents in a storytelling task. We conduct two experiments using four different dialogic stories. We manipulate agent personality on the extraversion scale, whether the agents adapt to one another in their gestural performance and agent gender. Our results show that subjects are able to perceive the intended var…
▽ More
We explore the expression of personality and adaptivity through the gestures of virtual agents in a storytelling task. We conduct two experiments using four different dialogic stories. We manipulate agent personality on the extraversion scale, whether the agents adapt to one another in their gestural performance and agent gender. Our results show that subjects are able to perceive the intended variation in extraversion between different virtual agents, independently of the story they are telling and the gender of the agent. A second study shows that subjects also prefer adaptive to nonadaptive virtual agents.
△ Less
Submitted 4 September, 2017;
originally announced September 2017.
-
Getting Reliable Annotations for Sarcasm in Online Dialogues
Authors:
Reid Swanson,
Stephanie Lukin,
Luke Eisenberg,
Thomas Chase Corcoran,
Marilyn A. Walker
Abstract:
The language used in online forums differs in many ways from that of traditional language resources such as news. One difference is the use and frequency of nonliteral, subjective dialogue acts such as sarcasm. Whether the aim is to develop a theory of sarcasm in dialogue, or engineer automatic methods for reliably detecting sarcasm, a major challenge is simply the difficulty of getting enough rel…
▽ More
The language used in online forums differs in many ways from that of traditional language resources such as news. One difference is the use and frequency of nonliteral, subjective dialogue acts such as sarcasm. Whether the aim is to develop a theory of sarcasm in dialogue, or engineer automatic methods for reliably detecting sarcasm, a major challenge is simply the difficulty of getting enough reliably labelled examples. In this paper we describe our work on methods for achieving highly reliable sarcasm annotations from untrained annotators on Mechanical Turk. We explore the use of a number of common statistical reliability measures, such as Kappa, Karger's, Majority Class, and EM. We show that more sophisticated measures do not appear to yield better results for our data than simple measures such as assuming that the correct label is the one that a majority of Turkers apply.
△ Less
Submitted 4 September, 2017;
originally announced September 2017.
-
Unsupervised Induction of Contingent Event Pairs from Film Scenes
Authors:
Zhichao Hu,
Elahe Rahimtoroghi,
Larissa Munishkina,
Reid Swanson,
Marilyn A. Walker
Abstract:
Human engagement in narrative is partially driven by reasoning about discourse relations between narrative events, and the expectations about what is likely to happen next that results from such reasoning. Researchers in NLP have tackled modeling such expectations from a range of perspectives, including treating it as the inference of the contingent discourse relation, or as a type of common-sense…
▽ More
Human engagement in narrative is partially driven by reasoning about discourse relations between narrative events, and the expectations about what is likely to happen next that results from such reasoning. Researchers in NLP have tackled modeling such expectations from a range of perspectives, including treating it as the inference of the contingent discourse relation, or as a type of common-sense causal reasoning. Our approach is to model likelihood between events by drawing on several of these lines of previous work. We implement and evaluate different unsupervised methods for learning event pairs that are likely to be contingent on one another. We refine event pairs that we learn from a corpus of film scene descriptions utilizing web search counts, and evaluate our results by collecting human judgments of contingency. Our results indicate that the use of web search counts increases the average accuracy of our best method to 85.64% over a baseline of 50%, as compared to an average accuracy of 75.15% without web search.
△ Less
Submitted 30 August, 2017;
originally announced August 2017.
-
Inferring Narrative Causality between Event Pairs in Films
Authors:
Zhichao Hu,
Marilyn A. Walker
Abstract:
To understand narrative, humans draw inferences about the underlying relations between narrative events. Cognitive theories of narrative understanding define these inferences as four different types of causality, that include pairs of events A, B where A physically causes B (X drop, X break), to pairs of events where A causes emotional state B (Y saw X, Y felt fear). Previous work on learning narr…
▽ More
To understand narrative, humans draw inferences about the underlying relations between narrative events. Cognitive theories of narrative understanding define these inferences as four different types of causality, that include pairs of events A, B where A physically causes B (X drop, X break), to pairs of events where A causes emotional state B (Y saw X, Y felt fear). Previous work on learning narrative relations from text has either focused on "strict" physical causality, or has been vague about what relation is being learned. This paper learns pairs of causal events from a corpus of film scene descriptions which are action rich and tend to be told in chronological order. We show that event pairs induced using our methods are of high quality and are judged to have a stronger causal relation than event pairs from Rel-grams.
△ Less
Submitted 30 August, 2017;
originally announced August 2017.
-
Inference of Fine-Grained Event Causality from Blogs and Films
Authors:
Zhichao Hu,
Elahe Rahimtoroghi,
Marilyn A Walker
Abstract:
Human understanding of narrative is mainly driven by reasoning about causal relations between events and thus recognizing them is a key capability for computational models of language understanding. Computational work in this area has approached this via two different routes: by focusing on acquiring a knowledge base of common causal relations between events, or by attempting to understand a parti…
▽ More
Human understanding of narrative is mainly driven by reasoning about causal relations between events and thus recognizing them is a key capability for computational models of language understanding. Computational work in this area has approached this via two different routes: by focusing on acquiring a knowledge base of common causal relations between events, or by attempting to understand a particular story or macro-event, along with its storyline. In this position paper, we focus on knowledge acquisition approach and claim that newswire is a relatively poor source for learning fine-grained causal relations between everyday events. We describe experiments using an unsupervised method to learn causal relations between events in the narrative genres of first-person narratives and film scene descriptions. We show that our method learns fine-grained causal relations, judged by humans as likely to be causal over 80% of the time. We also demonstrate that the learned event pairs do not exist in publicly available event-pair datasets extracted from newswire.
△ Less
Submitted 30 August, 2017;
originally announced August 2017.
-
Learning Fine-Grained Knowledge about Contingent Relations between Everyday Events
Authors:
Elahe Rahimtoroghi,
Ernesto Hernandez,
Marilyn A Walker
Abstract:
Much of the user-generated content on social media is provided by ordinary people telling stories about their daily lives. We develop and test a novel method for learning fine-grained common-sense knowledge from these stories about contingent (causal and conditional) relationships between everyday events. This type of knowledge is useful for text and story understanding, information extraction, qu…
▽ More
Much of the user-generated content on social media is provided by ordinary people telling stories about their daily lives. We develop and test a novel method for learning fine-grained common-sense knowledge from these stories about contingent (causal and conditional) relationships between everyday events. This type of knowledge is useful for text and story understanding, information extraction, question answering, and text summarization. We test and compare different methods for learning contingency relation, and compare what is learned from topic-sorted story collections vs. general-domain stories. Our experiments show that using topic-specific datasets enables learning finer-grained knowledge about events and results in significant improvement over the baselines. An evaluation on Amazon Mechanical Turk shows 82% of the relations between events that we learn from topic-sorted stories are judged as contingent.
△ Less
Submitted 30 August, 2017;
originally announced August 2017.
-
Automating Direct Speech Variations in Stories and Games
Authors:
Stephanie M. Lukin,
James O. Ryan,
Marilyn A. Walker
Abstract:
Dialogue authoring in large games requires not only content creation but the subtlety of its delivery, which can vary from character to character. Manually authoring this dialogue can be tedious, time-consuming, or even altogether infeasible. This paper utilizes a rich narrative representation for modeling dialogue and an expressive natural language generation engine for realizing it, and expands…
▽ More
Dialogue authoring in large games requires not only content creation but the subtlety of its delivery, which can vary from character to character. Manually authoring this dialogue can be tedious, time-consuming, or even altogether infeasible. This paper utilizes a rich narrative representation for modeling dialogue and an expressive natural language generation engine for realizing it, and expands upon a translation tool that bridges the two. We add functionality to the translator to allow direct speech to be modeled by the narrative representation, whereas the original translator supports only narratives told by a third person narrator. We show that we can perform character substitution in dialogues. We implement and evaluate a potential application to dialogue implementation: generating dialogue for games with big, dynamic, or procedurally-generated open worlds. We present a pilot study on human perceptions of the personalities of characters using direct speech, assuming unknown personality types at the time of authoring.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
PersonaBank: A Corpus of Personal Narratives and Their Story Intention Graphs
Authors:
Stephanie M. Lukin,
Kevin Bowden,
Casey Barackman,
Marilyn A. Walker
Abstract:
We present a new corpus, PersonaBank, consisting of 108 personal stories from weblogs that have been annotated with their Story Intention Graphs, a deep representation of the fabula of a story. We describe the topics of the stories and the basis of the Story Intention Graph representation, as well as the process of annotating the stories to produce the Story Intention Graphs and the challenges of…
▽ More
We present a new corpus, PersonaBank, consisting of 108 personal stories from weblogs that have been annotated with their Story Intention Graphs, a deep representation of the fabula of a story. We describe the topics of the stories and the basis of the Story Intention Graph representation, as well as the process of annotating the stories to produce the Story Intention Graphs and the challenges of adapting the tool to this new personal narrative domain We also discuss how the corpus can be used in applications that retell the story using different styles of tellings, co-tellings, or as a content planner.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
Modelling Protagonist Goals and Desires in First-Person Narrative
Authors:
Elahe Rahimtoroghi,
Jiaqi Wu,
Ruimin Wang,
Pranav Anand,
Marilyn A Walker
Abstract:
Many genres of natural language text are narratively structured, a testament to our predilection for organizing our experiences as narratives. There is broad consensus that understanding a narrative requires identifying and tracking the goals and desires of the characters and their narrative outcomes. However, to date, there has been limited work on computational models for this problem. We introd…
▽ More
Many genres of natural language text are narratively structured, a testament to our predilection for organizing our experiences as narratives. There is broad consensus that understanding a narrative requires identifying and tracking the goals and desires of the characters and their narrative outcomes. However, to date, there has been limited work on computational models for this problem. We introduce a new dataset, DesireDB, which includes gold-standard labels for identifying statements of desire, textual evidence for desire fulfillment, and annotations for whether the stated desire is fulfilled given the evidence in the narrative context. We report experiments on tracking desire fulfillment using different methods, and show that LSTM Skip-Thought model achieves F-measure of 0.7 on our corpus.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
Narrative Variations in a Virtual Storyteller
Authors:
Stephanie M. Lukin,
Marilyn A. Walker
Abstract:
Research on storytelling over the last 100 years has distinguished at least two levels of narrative representation (1) story, or fabula; and (2) discourse, or sujhet. We use this distinction to create Fabula Tales, a computational framework for a virtual storyteller that can tell the same story in different ways through the implementation of general narratological variations, such as varying direc…
▽ More
Research on storytelling over the last 100 years has distinguished at least two levels of narrative representation (1) story, or fabula; and (2) discourse, or sujhet. We use this distinction to create Fabula Tales, a computational framework for a virtual storyteller that can tell the same story in different ways through the implementation of general narratological variations, such as varying direct vs. indirect speech, character voice (style), point of view, and focalization. A strength of our computational framework is that it is based on very general methods for re-using existing story content, either from fables or from personal narratives collected from blogs. We first explain how a simple annotation tool allows naive annotators to easily create a deep representation of fabula called a story intention graph, and show how we use this representation to generate story tellings automatically. Then we present results of two studies testing our narratological parameters, and showing that different tellings affect the reader's perception of the story and characters.
△ Less
Submitted 28 August, 2017;
originally announced August 2017.
-
Generating Sentence Planning Variations for Story Telling
Authors:
Stephanie M. Lukin,
Lena I. Reed,
Marilyn A. Walker
Abstract:
There has been a recent explosion in applications for dialogue interaction ranging from direction-giving and tourist information to interactive story systems. Yet the natural language generation (NLG) component for many of these systems remains largely handcrafted. This limitation greatly restricts the range of applications; it also means that it is impossible to take advantage of recent work in e…
▽ More
There has been a recent explosion in applications for dialogue interaction ranging from direction-giving and tourist information to interactive story systems. Yet the natural language generation (NLG) component for many of these systems remains largely handcrafted. This limitation greatly restricts the range of applications; it also means that it is impossible to take advantage of recent work in expressive and statistical language generation that can dynamically and automatically produce a large number of variations of given content. We propose that a solution to this problem lies in new methods for developing language generation resources. We describe the ES-Translator, a computational language generator that has previously been applied only to fables, and quantitatively evaluate the domain independence of the EST by applying it to personal narratives from weblogs. We then take advantage of recent work on language generation to create a parameterized sentence planner for story generation that provides aggregation operations, variations in discourse and in point of view. Finally, we present a user evaluation of different personal narrative retellings.
△ Less
Submitted 28 August, 2017;
originally announced August 2017.
-
Identifying Subjective and Figurative Language in Online Dialogue
Authors:
Stephanie M. Lukin,
Luke Eisenberg,
Thomas Corcoran,
Marilyn A. Walker
Abstract:
More and more of the information on the web is dialogic, from Facebook newsfeeds, to forum conversations, to comment threads on news articles. In contrast to traditional, monologic resources such as news, highly social dialogue is very frequent in social media. We aim to automatically identify sarcastic and nasty utterances in unannotated online dialogue, extending a bootstrapping method previousl…
▽ More
More and more of the information on the web is dialogic, from Facebook newsfeeds, to forum conversations, to comment threads on news articles. In contrast to traditional, monologic resources such as news, highly social dialogue is very frequent in social media. We aim to automatically identify sarcastic and nasty utterances in unannotated online dialogue, extending a bootstrapping method previously applied to the classification of monologic subjective sentences in Riloff and Weibe 2003. We have adapted the method to fit the sarcastic and nasty dialogic domain. Our method is as follows: 1) Explore methods for identifying sarcastic and nasty cue words and phrases in dialogues; 2) Use the learned cues to train a sarcastic (nasty) Cue-Based Classifier; 3) Learn general syntactic extraction patterns from the sarcastic (nasty) utterances and define fine-tuned sarcastic patterns to create a Pattern-Based Classifier; 4) Combine both Cue-Based and fine-tuned Pattern-Based Classifiers to maximize precision at the expense of recall and test on unannotated utterances.
△ Less
Submitted 28 August, 2017;
originally announced August 2017.
-
Generating Different Story Tellings from Semantic Representations of Narrative
Authors:
Elena Rishes,
Stephanie M. Lukin,
David K. Elson,
Marilyn A. Walker
Abstract:
In order to tell stories in different voices for different audiences, interactive story systems require: (1) a semantic representation of story structure, and (2) the ability to automatically generate story and dialogue from this semantic representation using some form of Natural Language Generation (NLG). However, there has been limited research on methods for linking story structures to narrativ…
▽ More
In order to tell stories in different voices for different audiences, interactive story systems require: (1) a semantic representation of story structure, and (2) the ability to automatically generate story and dialogue from this semantic representation using some form of Natural Language Generation (NLG). However, there has been limited research on methods for linking story structures to narrative descriptions of scenes and story events. In this paper we present an automatic method for converting from Scheherazade's story intention graph, a semantic representation, to the input required by the Personage NLG engine. Using 36 Aesop Fables distributed in DramaBank, a collection of story encodings, we train translation rules on one story and then test these rules by generating text for the remaining 35. The results are measured in terms of the string similarity metrics Levenshtein Distance and BLEU score. The results show that we can generate the 35 stories with correct content: the test set stories on average are close to the output of the Scheherazade realizer, which was customized to this semantic representation. We provide some examples of story variations generated by personage. In future work, we will experiment with measuring the quality of the same stories generated in different voices, and with techniques for making storytelling interactive.
△ Less
Submitted 28 August, 2017;
originally announced August 2017.
-
M2D: Monolog to Dialog Generation for Conversational Story Telling
Authors:
Kevin K. Bowden,
Grace I. Lin,
Lena I. Reed,
Marilyn A. Walker
Abstract:
Storytelling serves many different social functions, e.g. stories are used to persuade, share troubles, establish shared values, learn social behaviors, and entertain. Moreover, stories are often told conversationally through dialog, and previous work suggests that information provided dialogically is more engaging than when provided in monolog. In this paper, we present algorithms for converting…
▽ More
Storytelling serves many different social functions, e.g. stories are used to persuade, share troubles, establish shared values, learn social behaviors, and entertain. Moreover, stories are often told conversationally through dialog, and previous work suggests that information provided dialogically is more engaging than when provided in monolog. In this paper, we present algorithms for converting a deep representation of a story into a dialogic storytelling, that can vary aspects of the telling, including the personality of the storytellers. We conduct several experiments to test whether dialogic storytellings are more engaging, and whether automatically generated variants in linguistic form that correspond to personality differences can be recognized in an extended storytelling dialog.
△ Less
Submitted 24 August, 2017;
originally announced August 2017.
-
Individual and Domain Adaptation in Sentence Planning for Dialogue
Authors:
F. Mairesse,
R. Prasad,
A. Stent,
M. A. Walker
Abstract:
One of the biggest challenges in the development and deployment of spoken dialogue systems is the design of the spoken language generation module. This challenge arises from the need for the generator to adapt to many features of the dialogue domain, user population, and dialogue context. A promising approach is trainable generation, which uses general-purpose linguistic knowledge that is automat…
▽ More
One of the biggest challenges in the development and deployment of spoken dialogue systems is the design of the spoken language generation module. This challenge arises from the need for the generator to adapt to many features of the dialogue domain, user population, and dialogue context. A promising approach is trainable generation, which uses general-purpose linguistic knowledge that is automatically adapted to the features of interest, such as the application domain, individual user, or user group. In this paper we present and evaluate a trainable sentence planner for providing restaurant information in the MATCH dialogue system. We show that trainable sentence planning can produce complex information presentations whose quality is comparable to the output of a template-based generator tuned to this domain. We also show that our method easily supports adapting the sentence planner to individuals, and that the individualized sentence planners generally perform better than models trained and tested on a population of individuals. Previous work has documented and utilized individual preferences for content selection, but to our knowledge, these results provide the first demonstration of individual preferences for sentence planning operations, affecting the content order, discourse structure and sentence structure of system responses. Finally, we evaluate the contribution of different feature sets, and show that, in our application, n-gram features often do as well as features based on higher-level linguistic representations.
△ Less
Submitted 31 October, 2011;
originally announced November 2011.
-
Learning Content Selection Rules for Generating Object Descriptions in Dialogue
Authors:
P. W. Jordan,
M. A. Walker
Abstract:
A fundamental requirement of any task-oriented dialogue system is the ability to generate object descriptions that refer to objects in the task domain. The subproblem of content selection for object descriptions in task-oriented dialogue has been the focus of much previous work and a large number of models have been proposed. In this paper, we use the annotated COCONUT corpus of task-oriented desi…
▽ More
A fundamental requirement of any task-oriented dialogue system is the ability to generate object descriptions that refer to objects in the task domain. The subproblem of content selection for object descriptions in task-oriented dialogue has been the focus of much previous work and a large number of models have been proposed. In this paper, we use the annotated COCONUT corpus of task-oriented design dialogues to develop feature sets based on Dale and Reiters (1995) incremental model, Brennan and Clarks (1996) conceptual pact model, and Jordans (2000b) intentional influences model, and use these feature sets in a machine learning experiment to automatically learn a model of content selection for object descriptions. Since Dale and Reiters model requires a representation of discourse structure, the corpus annotations are used to derive a representation based on Grosz and Sidners (1986) theory of the intentional structure of discourse, as well as two very simple representations of discourse structure based purely on recency. We then apply the rule-induction program RIPPER to train and test the content selection component of an object description generator on a set of 393 object descriptions from the corpus. To our knowledge, this is the first reported experiment of a trainable content selection component for object description generation in dialogue. Three separate content selection models that are based on the three theoretical models, all independently achieve accuracies significantly above the majority class baseline (17%) on unseen test data, with the intentional influences model (42.4%) performing significantly better than either the incremental model (30.4%) or the conceptual pact model (28.9%). But the best performing models combine all the feature sets, achieving accuracies near 60%. Surprisingly, a simple recency-based representation of discourse structure does as well as one based on intentional structure. To our knowledge, this is also the first empirical comparison of a representation of Grosz and Sidners model of discourse structure with a simpler model for any generation task.
△ Less
Submitted 9 September, 2011;
originally announced September 2011.
-
Automatically Training a Problematic Dialogue Predictor for a Spoken Dialogue System
Authors:
A. Gorin,
I. Langkilde-Geary,
M. A. Walker,
J. Wright,
H. Wright Hastie
Abstract:
Spoken dialogue systems promise efficient and natural access to a large variety of information sources and services from any phone. However, current spoken dialogue systems are deficient in their strategies for preventing, identifying and repairing problems that arise in the conversation. This paper reports results on automatically training a Problematic Dialogue Predictor to predic…
▽ More
Spoken dialogue systems promise efficient and natural access to a large variety of information sources and services from any phone. However, current spoken dialogue systems are deficient in their strategies for preventing, identifying and repairing problems that arise in the conversation. This paper reports results on automatically training a Problematic Dialogue Predictor to predict problematic human-computer dialogues using a corpus of 4692 dialogues collected with the 'How May I Help You' (SM) spoken dialogue system. The Problematic Dialogue Predictor can be immediately applied to the system's decision of whether to transfer the call to a human customer care agent, or be used as a cue to the system's dialogue manager to modify its behavior to repair problems, and even perhaps, to prevent them. We show that a Problematic Dialogue Predictor using automatically-obtainable features from the first two exchanges in the dialogue can predict problematic dialogues 13.2% more accurately than the baseline.
△ Less
Submitted 9 June, 2011;
originally announced June 2011.
-
An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email
Authors:
M. A. Walker
Abstract:
This paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is based on a combination of reinforcement learning and performance modeling of spoken dialogue systems. The reinforcement learning component applies Q-learning (Watkins, 1989), while the performance mod…
▽ More
This paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is based on a combination of reinforcement learning and performance modeling of spoken dialogue systems. The reinforcement learning component applies Q-learning (Watkins, 1989), while the performance modeling component applies the PARADISE evaluation framework (Walker et al., 1997) to learn the performance function (reward) used in reinforcement learning. We illustrate the method with a spoken dialogue system named ELVIS (EmaiL Voice Interactive System), that supports access to email over the phone. We conduct a set of experiments for training an optimal dialogue strategy on a corpus of 219 dialogues in which human users interact with ELVIS over the phone. We then test that strategy on a corpus of 18 dialogues. We show that ELVIS can learn to optimize its strategy selection for agent initiative, for reading messages, and for summarizing email folders.
△ Less
Submitted 1 June, 2011;
originally announced June 2011.
-
Centering, Anaphora Resolution, and Discourse Structure
Authors:
Marilyn A. Walker
Abstract:
Centering was formulated as a model of the relationship between attentional state, the form of referring expressions, and the coherence of an utterance within a discourse segment (Grosz, Joshi and Weinstein, 1986; Grosz, Joshi and Weinstein, 1995). In this chapter, I argue that the restriction of centering to operating within a discourse segment should be abandoned in order to integrate centerin…
▽ More
Centering was formulated as a model of the relationship between attentional state, the form of referring expressions, and the coherence of an utterance within a discourse segment (Grosz, Joshi and Weinstein, 1986; Grosz, Joshi and Weinstein, 1995). In this chapter, I argue that the restriction of centering to operating within a discourse segment should be abandoned in order to integrate centering with a model of global discourse structure. The within-segment restriction causes three problems. The first problem is that centers are often continued over discourse segment boundaries with pronominal referring expressions whose form is identical to those that occur within a discourse segment. The second problem is that recent work has shown that listeners perceive segment boundaries at various levels of granularity. If centering models a universal processing phenomenon, it is implausible that each listener is using a different centering algorithm.The third issue is that even for utterances within a discourse segment, there are strong contrasts between utterances whose adjacent utterance within a segment is hierarchically recent and those whose adjacent utterance within a segment is linearly recent. This chapter argues that these problems can be eliminated by replacing Grosz and Sidner's stack model of attentional state with an alternate model, the cache model. I show how the cache model is easily integrated with the centering algorithm, and provide several types of data from naturally occurring discourses that support the proposed integrated model. Future work should provide additional support for these claims with an examination of a larger corpus of naturally occurring discourses.
△ Less
Submitted 11 August, 1997;
originally announced August 1997.
-
PARADISE: A Framework for Evaluating Spoken Dialogue Agents
Authors:
Marilyn A. Walker,
Diane J. Litman,
Candace A. Kamm,
Alicia Abella
Abstract:
This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to perfo…
▽ More
This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.
△ Less
Submitted 15 April, 1997;
originally announced April 1997.
-
Improvising Linguistic Style: Social and Affective Bases for Agent Personality
Authors:
Marilyn A. Walker,
Janet E. Cahn,
Stephen J. Whittaker
Abstract:
This paper introduces Linguistic Style Improvisation, a theory and set of algorithms for improvisation of spoken utterances by artificial agents, with applications to interactive story and dialogue systems. We argue that linguistic style is a key aspect of character, and show how speech act representations common in AI can provide abstract representations from which computer characters can impro…
▽ More
This paper introduces Linguistic Style Improvisation, a theory and set of algorithms for improvisation of spoken utterances by artificial agents, with applications to interactive story and dialogue systems. We argue that linguistic style is a key aspect of character, and show how speech act representations common in AI can provide abstract representations from which computer characters can improvise. We show that the mechanisms proposed introduce the possibility of socially oriented agents, meet the requirements that lifelike characters be believable, and satisfy particular criteria for improvisation proposed by Hayes-Roth.
△ Less
Submitted 26 February, 1997;
originally announced February 1997.
-
Inferring Acceptance and Rejection in Dialogue by Default Rules of Inference
Authors:
Marilyn A. Walker
Abstract:
This paper discusses the processes by which conversants in a dialogue can infer whether their assertions and proposals have been accepted or rejected by their conversational partners. It expands on previous work by showing that logical consistency is a necessary indicator of acceptance, but that it is not sufficient, and that logical inconsistency is sufficient as an indicator of rejection, but…
▽ More
This paper discusses the processes by which conversants in a dialogue can infer whether their assertions and proposals have been accepted or rejected by their conversational partners. It expands on previous work by showing that logical consistency is a necessary indicator of acceptance, but that it is not sufficient, and that logical inconsistency is sufficient as an indicator of rejection, but it is not necessary. I show how conversants can use information structure and prosody as well as logical reasoning in distinguishing between acceptances and logically consistent rejections, and relate this work to previous work on implicature and default reasoning by introducing three new classes of rejection: {\sc implicature rejections}, {\sc epistemic rejections} and {\sc deliberation rejections}. I show how these rejections are inferred as a result of default inferences, which, by other analyses, would have been blocked by the context. In order to account for these facts, I propose a model of the common ground that allows these default inferences to go through, and show how the model, originally proposed to account for the various forms of acceptance, can also model all types of rejection.
△ Less
Submitted 7 September, 1996;
originally announced September 1996.
-
Limited Attention and Discourse Structure
Authors:
Marilyn A. Walker
Abstract:
This squib examines the role of limited attention in a theory of discourse structure and proposes a model of attentional state that relates current hierarchical theories of discourse structure to empirical evidence about human discourse processing capabilities. First, I present examples that are not predicted by Grosz and Sidner's stack model of attentional state. Then I consider an alternative…
▽ More
This squib examines the role of limited attention in a theory of discourse structure and proposes a model of attentional state that relates current hierarchical theories of discourse structure to empirical evidence about human discourse processing capabilities. First, I present examples that are not predicted by Grosz and Sidner's stack model of attentional state. Then I consider an alternative model of attentional state, the cache model, which accounts for the examples, and which makes particular processing predictions. Finally I suggest a number of ways that future research could distinguish the predictions of the cache model and the stack model.
△ Less
Submitted 14 August, 1996; v1 submitted 18 December, 1995;
originally announced December 1995.
-
The Effect of Resource Limits and Task Complexity on Collaborative Planning in Dialogue
Authors:
Marilyn A. Walker
Abstract:
This paper shows how agents' choice in communicative action can be designed to mitigate the effect of their resource limits in the context of particular features of a collaborative planning task. I first motivate a number of hypotheses about effective language behavior based on a statistical analysis of a corpus of natural collaborative planning dialogues. These hypotheses are then tested in a d…
▽ More
This paper shows how agents' choice in communicative action can be designed to mitigate the effect of their resource limits in the context of particular features of a collaborative planning task. I first motivate a number of hypotheses about effective language behavior based on a statistical analysis of a corpus of natural collaborative planning dialogues. These hypotheses are then tested in a dialogue testbed whose design is motivated by the corpus analysis. Experiments in the testbed examine the interaction between (1) agents' resource limits in attentional capacity and inferential capacity; (2) agents' choice in communication; and (3) features of communicative tasks that affect task difficulty such as inferential complexity, degree of belief coordination required, and tolerance for errors. The results show that good algorithms for communication must be defined relative to the agents' resource limits and the features of the task. Algorithms that are inefficient for inferentially simple, low coordination or fault-tolerant tasks are effective when tasks require coordination or complex inferences, or are fault-intolerant. The results provide an explanation for the occurrence of utterances in human dialogues that, prima facie, appear inefficient, and provide the basis for the design of effective algorithms for communicative choice for resource limited agents.
△ Less
Submitted 15 November, 1995;
originally announced November 1995.
-
Discourse and Deliberation: Testing a Collaborative Strategy
Authors:
Marilyn A. Walker
Abstract:
A discourse strategy is a strategy for communicating with another agent. Designing effective dialogue systems requires designing agents that can choose among discourse strategies. We claim that the design of effective strategies must take cognitive factors into account, propose a new method for testing the hypothesized factors, and present experimental results on an effective strategy for supporti…
▽ More
A discourse strategy is a strategy for communicating with another agent. Designing effective dialogue systems requires designing agents that can choose among discourse strategies. We claim that the design of effective strategies must take cognitive factors into account, propose a new method for testing the hypothesized factors, and present experimental results on an effective strategy for supporting deliberation. The proposed method of computational dialogue simulation provides a new empirical basis for computational linguistics.
△ Less
Submitted 16 March, 1995;
originally announced March 1995.
-
Redundancy in Collaborative Dialogue
Authors:
Marilyn A. Walker
Abstract:
In dialogues in which both agents are autonomous, each agent deliberates whether to accept or reject the contributions of the current speaker. A speaker cannot simply assume that a proposal or an assertion will be accepted. However, an examination of a corpus of naturally-occurring problem-solving dialogues shows that agents often do not explicitly indicate acceptance or rejection. Rather the spea…
▽ More
In dialogues in which both agents are autonomous, each agent deliberates whether to accept or reject the contributions of the current speaker. A speaker cannot simply assume that a proposal or an assertion will be accepted. However, an examination of a corpus of naturally-occurring problem-solving dialogues shows that agents often do not explicitly indicate acceptance or rejection. Rather the speaker must infer whether the hearer understands and accepts the current contribution based on indirect evidence provided by the hearer's next dialogue contribution. In this paper, I propose a model of the role of informationally redundant utterances in providing evidence to support inferences about mutual understanding and acceptance. The model (1) requires a theory of mutual belief that supports mutual beliefs of various strengths; (2) explains the function of a class of informationally redundant utterances that cannot be explained by other accounts; and (3) contributes to a theory of dialogue by showing how mutual beliefs can be inferred in the absence of the master-slave assumption.
△ Less
Submitted 16 March, 1995;
originally announced March 1995.
-
Evaluating Discourse Processing Algorithms
Authors:
Marilyn A. Walker
Abstract:
In order to take steps towards establishing a methodology for evaluating Natural Language systems, we conducted a case study. We attempt to evaluate two different approaches to anaphoric processing in discourse by comparing the accuracy and coverage of two published algorithms for finding the co-specifiers of pronouns in naturally occurring texts and dialogues. We present the quantitative result…
▽ More
In order to take steps towards establishing a methodology for evaluating Natural Language systems, we conducted a case study. We attempt to evaluate two different approaches to anaphoric processing in discourse by comparing the accuracy and coverage of two published algorithms for finding the co-specifiers of pronouns in naturally occurring texts and dialogues. We present the quantitative results of hand-simulating these algorithms, but this analysis naturally gives rise to both a qualitative evaluation and recommendations for performing such evaluations in general. We illustrate the general difficulties encountered with quantitative evaluation. These are problems with: (a) allowing for underlying assumptions, (b) determining how to handle underspecifications, and (c) evaluating the contribution of false positives and error chaining.
△ Less
Submitted 11 October, 1994; v1 submitted 10 October, 1994;
originally announced October 1994.
-
Experimentally Evaluating Communicative Strategies: The Effect of the Task
Authors:
Marilyn A. Walker
Abstract:
Effective problem solving among multiple agents requires a better understanding of the role of communication in collaboration. In this paper we show that there are communicative strategies that greatly improve the performance of resource-bounded agents, but that these strategies are highly sensitive to the task requirements, situation parameters and agents' resource limitations. We base our argu…
▽ More
Effective problem solving among multiple agents requires a better understanding of the role of communication in collaboration. In this paper we show that there are communicative strategies that greatly improve the performance of resource-bounded agents, but that these strategies are highly sensitive to the task requirements, situation parameters and agents' resource limitations. We base our argument on two sources of evidence: (1) an analysis of a corpus of 55 problem solving dialogues, and (2) experimental simulations of collaborative problem solving dialogues in an experimental world, Design-World, where we parameterize task requirements, agents' resources and communicative strategies.
△ Less
Submitted 24 August, 1994;
originally announced August 1994.