Natural language generation involves the automatic formulation of natural language sentences. The ultimate goal in NLG is for the computer to produce language that is as meaningful, fluent and idiomatic as that produced by humans. A typical NLG system will include components for selecting and structuring the content to be generated (content planning), assigning content units to sentences (sentence planning) and realizing each content unit in a particular language (surface realization). Although there are surface realizers that can produce fluent output, very little research has been done on adjunct ordering. Adjuncts are either ordered in the surface realizer's input, or all possibilities are generated and the alternatives are ranked using a language model.
In this thesis, I address modifier and adjunct ordering in trainable surface realization for English. First, I describe a chart-based surface realizer I have implemented that uses a probabilistic feature-based tree adjoining grammar extracted automatically from a training corpus. My surface realizer takes input logical forms and performs insertion of function words, word and constituent ordering, and morphological inflection. Its performance is comparable to that of other trainable surface realizers that take the same type of input.
Second, I present a set of experiments in which I compare different approaches to word and constituent ordering for several tasks: the dative and benefactive alternations, ordering of prenominal adjectives, ordering of adverbials, ordering of prepositional phrases, and ordering of adjuncts in general. I compare a classifier-based approach with rich feature sets to two simple relative frequency approaches (one using head words as the only the feature, the other using part-of-speech tags as the only feature). I experimented with lexical, syntactic, semantic, pragmatic, sentence and frequency features in my classifier-based approach. I present evaluations of these approaches in terms of classification accuracy and improvement in performance of the surface realizer. I show that the classification-based approach to word and adjunct ordering improves on the simple relative frequency based approach. I also show that different feature sets are useful for different tasks and genres, with (in general) pragmatic features being more important for dialog and syntactic features for text.
Finally, I describe a human evaluation of my initial surface realizer with relative frequency approaches to ordering of adjuncts and modifiers, and of my modified surface realizer which uses a classification-based approach to ordering of adjuncts and modifiers. I show that modeling of adjunct and modifier ordering can lead to small but significant improvements in surface realization performance, and analyze the types of errors identified by the human evaluators.
Recommendations
Post-Ordering by Parsing with ITG for Japanese-English Statistical Machine Translation
Word reordering is a difficult task for translation between languages with widely different word orders, such as Japanese and English. A previously proposed post-ordering method for Japanese-to-English translation first translates a Japanese sentence ...
Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation
This article proposes a novel reordering method for efficient two-step Japanese-to-English statistical machine translation (SMT) that isolates reordering from SMT and solves it after lexical translation. This reordering problem, called post-ordering, is ...
Post-ordering by parsing for Japanese-English statistical machine translation
ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2Reordering is a difficult task in translating between widely different languages such as Japanese and English. We employ the post-ordering framework proposed by (Sudoh et al., 2011b) for Japanese to English translation and improve upon the reordering ...