Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Anette Frank

Research Interests:
When linguistically motivated grammars are implemented on a largerscale and applied to real-life corpora, keeping track ofambiguity sources becomes a difficult task. Yet it is of greatimportance, since unintended ambiguities arising... more
When linguistically motivated grammars are implemented on a largerscale and applied to real-life corpora, keeping track ofambiguity sources becomes a difficult task. Yet it is of greatimportance, since unintended ambiguities arising fromunderrestricted rules or interactions have to be distinguishedfrom linguistically warranted ambiguities. In this paper wereport on various tools in the XLE grammar development platformwhich can be used for ambiguity management in grammar writing.
November 10, 2000 Abstract. This paper sketches how one can construct Underspecified Discourse Representation Structures (UDRSs (Reyle, 1993)) via glue semantics (Dalrymple et al., 1999b). In most cases, UDRSs are constructed in linear... more
November 10, 2000 Abstract. This paper sketches how one can construct Underspecified Discourse Representation Structures (UDRSs (Reyle, 1993)) via glue semantics (Dalrymple et al., 1999b). In most cases, UDRSs are constructed in linear time, analogously to the linear time construction of skeleton-modifier representations presented in (Gupta and Lamping, 1998).
Abstract Considering data obtained from a corpus of database QA dialogues, we address the nature of the discourse structure needed to resolve the several kinds of contextual phenomena found in our corpus. We look at the thematic relations... more
Abstract Considering data obtained from a corpus of database QA dialogues, we address the nature of the discourse structure needed to resolve the several kinds of contextual phenomena found in our corpus. We look at the thematic relations holding between questions and the preceding context and discuss to which extent thematic related-ness plays a role in discourse structure.
Abstract Treebanks are language resources that provide annotations at various levels of linguistic structure starting from the word level. They typically provide syntactic constituent or dependency structures for sentences, but... more
Abstract Treebanks are language resources that provide annotations at various levels of linguistic structure starting from the word level. They typically provide syntactic constituent or dependency structures for sentences, but increasingly extend to annotation beyond syntactic structure, in-cluding semantic, pragmatic and rhetorical annotation, or go beyond a single language, as in parallel treebanks.
Abstract We present a cross-lingual projection framework for temporal annotations. Automatically obtained TimeML annotations in the English portion of a parallel corpus are transferred to the German translation along a word alignment.... more
Abstract We present a cross-lingual projection framework for temporal annotations. Automatically obtained TimeML annotations in the English portion of a parallel corpus are transferred to the German translation along a word alignment. Direct projection augmented with shallow heuristic knowledge outperforms the uninformed baseline by 6.64% F1-measure for events, and by 17.93% for time expressions.
Abstract We present an approach to model hidden attributes in the compositional semantics of adjective-noun phrases in a distributional model. For the representation of adjective meanings, we reformulate the pattern-based approach for... more
Abstract We present an approach to model hidden attributes in the compositional semantics of adjective-noun phrases in a distributional model. For the representation of adjective meanings, we reformulate the pattern-based approach for attribute learning of Almuhareb (2006) in a structured vector space model (VSM). This model is complemented by a structured vector space representing attribute dimensions of noun meanings.
We propose a analysis of well {known contrastive data of complex predicate formation in French and Italian where we distinguish, with Rosen (1989), between two types of argument composition: full and partial merger of argument structure.... more
We propose a analysis of well {known contrastive data of complex predicate formation in French and Italian where we distinguish, with Rosen (1989), between two types of argument composition: full and partial merger of argument structure. Two alternative ways are investigated to integrate this analysis into the LFG framework: complex predicate formation in syntax or by lexical rule.
Abstract We present a method for rule-based structure conversion of existing treebanks, which aims at the extraction of linguistically sound, corpus-based grammars in a specific grammatical framework. We apply this method to the NEGRA... more
Abstract We present a method for rule-based structure conversion of existing treebanks, which aims at the extraction of linguistically sound, corpus-based grammars in a specific grammatical framework. We apply this method to the NEGRA treebank to derive an LTAG grammar of German. We describe the methodology and tools for structure conversion and LTAG extraction. The conversion and grammar extraction process imports linguistic generalisations that are missing the in original treebank.
Abstract We present a method for corpus-based induction of an LFG syntax-semantics interface for frame semantic processing in a computational LFG parsing architecture. We show how to model frame semantic annotations in an LFG projection... more
Abstract We present a method for corpus-based induction of an LFG syntax-semantics interface for frame semantic processing in a computational LFG parsing architecture. We show how to model frame semantic annotations in an LFG projection architecture, including special phenomena that involve non-isomorphic mappings between levels. Frame semantic annotations are ported from a manually annotated corpus to a “parallel” LFG corpus.
Abstract We present a general approach to formally modelling corpora with multi-layered annotation, thereby inducing a lexicon model in a typed logical representation language, OWL DL. This model can be interpreted as a graph structure... more
Abstract We present a general approach to formally modelling corpora with multi-layered annotation, thereby inducing a lexicon model in a typed logical representation language, OWL DL. This model can be interpreted as a graph structure that offers flexible querying functionality beyond current XML-based query languages and powerful methods for consistency control. We illustrate our approach by applying it to the syntactically and semantically annotated SALSA/TIGER corpus.
Abstract In this paper we pursue two related goals: First, we establish a conceptual link between chunkbased syntactic structures as typically assumed in shallow parsing approaches, as opposed to principle-based syntactic structures as... more
Abstract In this paper we pursue two related goals: First, we establish a conceptual link between chunkbased syntactic structures as typically assumed in shallow parsing approaches, as opposed to principle-based syntactic structures as assumed in theoretical linguistics research. This conceptual link emerges from the study of configurational vs.
We look at a contrasting picture where some piece of functional/semantic information, namely tense, may be de ned morphologically by tense in ection, or else compositionally in syntax (in conjunction with in ection on syntactic items).... more
We look at a contrasting picture where some piece of functional/semantic information, namely tense, may be de ned morphologically by tense in ection, or else compositionally in syntax (in conjunction with in ection on syntactic items). The question that is at issue is how to represent analytic tense formation to obtain a uniform functional structure for analytic and synthetic tense formation.
Abstract We present an implemented approach for domainrestricted question answering from structured knowledge sources, based on robust semantic analysis in a hybrid NLP system architecture. We build on a lexicalsemantic conceptual... more
Abstract We present an implemented approach for domainrestricted question answering from structured knowledge sources, based on robust semantic analysis in a hybrid NLP system architecture. We build on a lexicalsemantic conceptual structure for question interpretation, which is interfaced with domain-specific concepts and properties in a structured knowledge base. Question interpretation involves a limited amount of domain-specific inferences and accounts for quantificational questions.
In machine translation, ambiguities in the source language often carry across to the target language. These include syntactic ambiguities, such as some prepositional phrase attachments (John saw the man with a telescope/Jean a vu l'homme... more
In machine translation, ambiguities in the source language often carry across to the target language. These include syntactic ambiguities, such as some prepositional phrase attachments (John saw the man with a telescope/Jean a vu l'homme avec un téléscope) or semantic ambiguities, such as quantifier scope (Every student answered a question/Jeder Student beantwortete eine Frage).
Abstract The applicability of ontologies for natural language processing depends on the ability to link ontological concepts and relations to their realisations in texts. We present a general, resource-poor account to create such a... more
Abstract The applicability of ontologies for natural language processing depends on the ability to link ontological concepts and relations to their realisations in texts. We present a general, resource-poor account to create such a linking automatically by extracting Wikipedia articles corresponding to ontology classes. We evaluate our approach in an experiment with the Music Ontology. We consider linking as a promising starting point for subsequent steps of information extraction.
We present an LFG syntax-semantics interface for the semi-automatic annotation of frame semantic roles for German in the SALSA project.
Abstract Named Entity Recognition and Classification (NERC) is a well-studied NLP task typically focused on coarse-grained named entity (NE) classes. NERC for more fine-grained semantic NE classes has not been systematically studied. This... more
Abstract Named Entity Recognition and Classification (NERC) is a well-studied NLP task typically focused on coarse-grained named entity (NE) classes. NERC for more fine-grained semantic NE classes has not been systematically studied. This paper quantifies the difficulty of fine-grained NERC (FG-NERC) when performed at large scale on the people domain. We apply unsupervised acquisition methods to construct a gold standard dataset for FG-NERC.
Abstract This work describes first steps towards building a system that synchronously generates multimodal (textual and visual) route directions for pedestrians. We pursue a corpus-based approach for building a generation model that... more
Abstract This work describes first steps towards building a system that synchronously generates multimodal (textual and visual) route directions for pedestrians. We pursue a corpus-based approach for building a generation model that produces natural instructions in multiple languages. We conducted an empirical study to collect verbal route directions, and annotated the acquired texts on different levels. Here we describe the experimental setting and an analysis of the collected data.
Abstract We present a general approach to formally modelling corpora with multi-layered annotation in a typed logical representation language, OWL DL. By defining abstractions over the corpus data, we can generalise from a large set of... more
Abstract We present a general approach to formally modelling corpora with multi-layered annotation in a typed logical representation language, OWL DL. By defining abstractions over the corpus data, we can generalise from a large set of individual corpus annotations, thereby inducing a lexicon model. The resulting combined corpus and lexicon model can be interpreted as a graph structure that offers flexible querying functionality beyond current XML-based query languages.
Abstract This paper presents a supervised approach for identifying generic noun phrases in context. Generic statements express rule-like knowledge about kinds or events. Therefore, their identification is important for the automatic... more
Abstract This paper presents a supervised approach for identifying generic noun phrases in context. Generic statements express rule-like knowledge about kinds or events. Therefore, their identification is important for the automatic construction of knowledge bases. In particular, the distinction between generic and non-generic statements is crucial for the correct encoding of generic and instance-level information. Generic expressions have been studied extensively in formal semantics.
Abstract We present a treebank conversion method by which we construct an RMRS bank for HPSG parser evaluation from the TIGER Dependency Bank. Our method effectively performs automatic RMRS semantics construction from functional... more
Abstract We present a treebank conversion method by which we construct an RMRS bank for HPSG parser evaluation from the TIGER Dependency Bank. Our method effectively performs automatic RMRS semantics construction from functional dependencies, following the semantic algebra of Copestake et al.(2001). We present the semantics construction mechanism, and focus on some special phenomena. Automatic conversion is followed by manual validation.
Abstract This paper introduces an attribute selection task as a way to characterize the inherent meaning of property-denoting adjectives in adjective-noun phrases, such as eg hot in hot summer denoting the attribute temperature, rather... more
Abstract This paper introduces an attribute selection task as a way to characterize the inherent meaning of property-denoting adjectives in adjective-noun phrases, such as eg hot in hot summer denoting the attribute temperature, rather than taste. We formulate this task in a vector space model that represents adjectives and nouns as vectors in a semantic space defined over possible attributes. The vectors incorporate latent semantic information obtained from two variants of LDA topic models.
Abstract This paper gives an overview of an interdisciplinary research project that is concerned with the application of computational linguistics methods to the analysis of the structure and variance of rituals, as investigated in ritual... more
Abstract This paper gives an overview of an interdisciplinary research project that is concerned with the application of computational linguistics methods to the analysis of the structure and variance of rituals, as investigated in ritual science. We present motivation and prospects of a computational approach to ritual research, and explain the choice of specific analysis techniques. We discuss design decisions for data collection and processing and present the general NLP architecture.
Abstract. In this paper we investigate the use of standard natural language processing (NLP) tools and annotation methods for processing linguistic data from ritual science. The work is embedded in an interdisciplinary project that... more
Abstract. In this paper we investigate the use of standard natural language processing (NLP) tools and annotation methods for processing linguistic data from ritual science. The work is embedded in an interdisciplinary project that addresses the study of the structure and variance of rituals, as investigated in ritual science, under a new perspective: by applying empirical and quantitative computational linguistic analysis techniques to ritual descriptions.
Abstract Linking implicit semantic roles is a challenging problem in discourse processing. Unlike prior work inspired by SRL, we cast this problem as an anaphora resolution task and embed it in an entity-based coreference resolution (CR)... more
Abstract Linking implicit semantic roles is a challenging problem in discourse processing. Unlike prior work inspired by SRL, we cast this problem as an anaphora resolution task and embed it in an entity-based coreference resolution (CR) architecture. Our experiments clearly show that CR-oriented features yield strongest performance exceeding a strong baseline. We address the problem of data sparsity by applying heuristic labeling techniques, guided by the anaphoric nature of the phenomenon.
Since the successful exploitation of treebanks for training stochastic parsers, treebanks are under development for many languages. Treebanks further enable evaluation and benchmarking of competitive parsing and grammar models. While... more
Since the successful exploitation of treebanks for training stochastic parsers, treebanks are under development for many languages. Treebanks further enable evaluation and benchmarking of competitive parsing and grammar models. While parser evaluation against treebanks is most natural for treebankderived grammars, it is extremely difficult for handcrafted grammars that represent higher-level functional or semantic information, such as LFG, HPSG, or CCG grammars (cf. Carroll et al., 2002).
Abstract This contribution investigates novel techniques for error detection in automatic semantic annotations, as an attempt to reconcile error-prone NLP processing with high quality standards required for empirical research in Digital... more
Abstract This contribution investigates novel techniques for error detection in automatic semantic annotations, as an attempt to reconcile error-prone NLP processing with high quality standards required for empirical research in Digital Humanities. We demonstrate the state-of-the-art per-formance of semantic NLP systems on a corpus of ritual texts and report performance gains we obtain using domain adaptation techniques.
Abstract In this paper we discuss motivations and strategies for generalising over instance-based frame assignment rules that we extract from frame-annotated corpora. Corpus-induced syntax-semantics mapping rules for frame assignment can... more
Abstract In this paper we discuss motivations and strategies for generalising over instance-based frame assignment rules that we extract from frame-annotated corpora. Corpus-induced syntax-semantics mapping rules for frame assignment can be used for automatic semantic role labelling of unparsed text, but further, to extract linguistic knowledge for a lexical semantic resource with a general syntax-semantics interface.
Manual, large scale (computational) grammar development is time consuming, expensive and requires lots of linguistic expertise. More recently, a number of alternatives based on treebank resources (such as Penn-II, Susanne, AP treebank)... more
Manual, large scale (computational) grammar development is time consuming, expensive and requires lots of linguistic expertise. More recently, a number of alternatives based on treebank resources (such as Penn-II, Susanne, AP treebank) have been explored. The idea is to automatically``induce''or rather read off (P) CFG grammars from the parse annotated treebank resources and to use the treebank grammars thus obtained in (probabilistic) parsing or as a starting point for further grammar development.
Abstract Various methods aim at overcoming the shortage of NLP resources, especially for resource-poor languages. We present a cross-lingual projection account that aims at inducing an annotated treebank to be used for parser induction... more
Abstract Various methods aim at overcoming the shortage of NLP resources, especially for resource-poor languages. We present a cross-lingual projection account that aims at inducing an annotated treebank to be used for parser induction for Polish. Our approach builds on Hwa et al.'s projection method [7] that we adapt to the LFG framework. The goal of the experiment is the induction of an LFG f-structure bank for Polish. The projection yields competitive results.
Abstract: We describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontology-based information extraction from soccer web pages for automatic population of a knowledge base that can be used for... more
Abstract: We describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontology-based information extraction from soccer web pages for automatic population of a knowledge base that can be used for domain-specific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component.
Wikipedia ist ein großes Hypertext Dokument welches aus einzelnen Artikeln, oder Seiten, besteht die untereinander mit Links verbunden sind. Die Mehrheit dieser Links dient nicht nur der Navigation innerhalb von Wikipedia, sondern vor... more
Wikipedia ist ein großes Hypertext Dokument welches aus einzelnen Artikeln, oder Seiten, besteht die untereinander mit Links verbunden sind. Die Mehrheit dieser Links dient nicht nur der Navigation innerhalb von Wikipedia, sondern vor allem dem Verweis auf Artikel, welche die verlinkte Entität oder das Konzept diskutieren. Eine für unser Projekt wichtige Eigenschaft von Artikeln in Wikipedia ist es, dass diese eindeutige Entität diskutieren, bzw klar ausdrücken, welcher Wortsinn gemeint ist.
Abstract Discourse coherence is an important aspect of natural language that is still understudied in computational linguistics. Our aim is to learn factors that constitute coherent discourse from data, with a focus on how to realize... more
Abstract Discourse coherence is an important aspect of natural language that is still understudied in computational linguistics. Our aim is to learn factors that constitute coherent discourse from data, with a focus on how to realize predicateargument structures (PAS) in a model that exceeds the sentence level. In particular, we aim to study the case of non-realized arguments as a coherence inducing factor. This task can be broken down into two subtasks.
This paper discusses issues relevant to writing large-scale parallel grammars. 1 It is a direct result of our experiences with ParGram, a parallel grammar project involving Xerox PARC (English), XRCE (French), IMS Stuttgart (German), and... more
This paper discusses issues relevant to writing large-scale parallel grammars. 1 It is a direct result of our experiences with ParGram, a parallel grammar project involving Xerox PARC (English), XRCE (French), IMS Stuttgart (German), and University of Bergen (Norwegian). The basic goal of the ParGram project is to write large-scale LFG grammars with parallel analyses. In this introduction, we define what we mean by parallel analyses and by large scale, and briefly discuss the system which we use.
Abstract We present a constraint-based syntax-semantics interface for the construction of RMRS (Robust Minimal Recursion Semantics) representations from shallow grammars. The architecture is designed to allow modular interfaces to... more
Abstract We present a constraint-based syntax-semantics interface for the construction of RMRS (Robust Minimal Recursion Semantics) representations from shallow grammars. The architecture is designed to allow modular interfaces to existing shallow grammars of various depth-ranging from chunk grammars to context-free stochastic grammars. We define modular semantics construction principles in a typed feature structure formalism that allow flexible adaptation to alternative grammars and different languages.
Page 1. Syntax III: Koordination in uni kationsbasierten Syntaxtheorien SS 2002 16. Juli 2002 Anette Frank Koordination und Diskurskoh renz (II) Kehler (1996): Coherence and the Coordinate Structure Constraint Kehler (2002): Coherence,... more
Page 1. Syntax III: Koordination in uni kationsbasierten Syntaxtheorien SS 2002 16. Juli 2002 Anette Frank Koordination und Diskurskoh renz (II) Kehler (1996): Coherence and the Coordinate Structure Constraint Kehler (2002): Coherence, Reference, and the Theory of Grammar 1 Koh renz und Gapping Eigenschaften von Gapping (cf. Sag(1976)) nur in Koordinationsstrukturen (1)a. Sandy played the guitar, and Betsy the recorder. b. * Sandy played the guitar, while/after/because/although Betsy the recorder.
Abstract Route directions are natural language (NL) statements that specify, for a given navigational task and an automatically computed route representation, a sequence of actions to be followed by the user to reach his or her goal. A... more
Abstract Route directions are natural language (NL) statements that specify, for a given navigational task and an automatically computed route representation, a sequence of actions to be followed by the user to reach his or her goal. A corpus-based approach to generate route directions involves (i) the selection of elements along the route that need to be mentioned, and (ii) the induction of a mapping from route elements to linguistic structures that can be used as a basis for NL generation.
Abstract We present a semi-supervised machine-learning approach for the classification of adjectives into property-vs. relationdenoting adjectives, a distinction that is highly relevant for ontology learning. The feasibility of this... more
Abstract We present a semi-supervised machine-learning approach for the classification of adjectives into property-vs. relationdenoting adjectives, a distinction that is highly relevant for ontology learning. The feasibility of this classification task is evaluated in a human annotation experiment. We observe that token-level annotation of these classes is expensive and difficult. Yet, a careful corpus analysis reveals that adjective classes tend to be stable, with few occurrences of class shifts observed at the token level.
Abstract Generating coherent discourse is an important aspect in natural language generation. Our aim is to learn factors that constitute coherent discourse from data, with a focus on how to realize predicate-argument structures in a... more
Abstract Generating coherent discourse is an important aspect in natural language generation. Our aim is to learn factors that constitute coherent discourse from data, with a focus on how to realize predicate-argument structures in a model that exceeds the sentence level. We present an important subtask for this overall goal, in which we align predicates across comparable texts, admitting partial argument structure correspondence.
The earliest corpus-based approaches to stochastic parsing (eg Sampson et al.(1989), Fujisaki et al.(1989), Sharman et al.(1990), Black (1992)) used a variety of data resources and evaluation techniques.
Abstract This work describes an online application that uses Natural Language Generation (NLG) methods to generate walking directions in combination with dynamic 2D visualisation. We make use of third party resources, which provide for a... more
Abstract This work describes an online application that uses Natural Language Generation (NLG) methods to generate walking directions in combination with dynamic 2D visualisation. We make use of third party resources, which provide for a given query (geographic) routes and landmarks along the way. We present a statistical model that can be used for generating natural language directions.
This paper focusses on two aspects of the verb second (V2) property in German. One of our concerns is to present an empirically and theoretically motivated analysis of German sentence structure and the V2 phenomenon that covers a... more
This paper focusses on two aspects of the verb second (V2) property in German. One of our concerns is to present an empirically and theoretically motivated analysis of German sentence structure and the V2 phenomenon that covers a comprehensive set of data which any analysis of German sentence structure has to account for. The analysis we propose will be formulated in the HPSG framework, but is closely related in many respects to a GB analysis worked out in Haider (1993).
Abstract This paper presents ongoing efforts on developing Word Sense Disambiguation (WSD) resources for the German language, using GermaNet as a basis. We bootstrap two WSD systems for German.(i) We enrich GermaNet with predominant sense... more
Abstract This paper presents ongoing efforts on developing Word Sense Disambiguation (WSD) resources for the German language, using GermaNet as a basis. We bootstrap two WSD systems for German.(i) We enrich GermaNet with predominant sense information, following previous unsupervised methods to acquire predominant senses of words.
ABSTRACT In LFG syntactic structure is represented in terms of two levels of syntactic description: c-structure and f-structure. Context-free PS rules with f-descriptions define the functional correspondence between c-and f-structure.... more
ABSTRACT In LFG syntactic structure is represented in terms of two levels of syntactic description: c-structure and f-structure. Context-free PS rules with f-descriptions define the functional correspondence between c-and f-structure. Subcategorisation and long-distance dependencies are represented in f-structure, while the functional mapping between c-and f-structure accounts for properties like word order variation.
Challenges in the development of state-of-the-art navigation tools involve the synchronous generation of user-tailored natural language directions and accompanying graphical representations. Of particular interest is the combination of... more
Challenges in the development of state-of-the-art navigation tools involve the synchronous generation of user-tailored natural language directions and accompanying graphical representations. Of particular interest is the combination of indoor and outdoor scenarios. We describe an architecture for simultaneously generating route directions and suitable visualizations of the navigational context for walking directions.
ABSTRACT Led by the observation of similarities and variances in rituals across times and cultures, ritual scientists are discussing an underlying abstract–and possibly universal–structure of rituals, which nevertheless is subject to... more
ABSTRACT Led by the observation of similarities and variances in rituals across times and cultures, ritual scientists are discussing an underlying abstract–and possibly universal–structure of rituals, which nevertheless is subject to variation. In an interdisciplinary project, we investigate the use of computational linguistic techniques to make characteristic properties and structures in rituals overt. For this sake, we apply formal and quantitative computational linguistic analysis techniques on textual ritual descriptions.
Abstract We present a distributional vector space model that incorporates Latent Dirichlet Allocation in order to capture the semantic relation holding between adjectives and nouns along interpretable dimensions of meaning: The meaning of... more
Abstract We present a distributional vector space model that incorporates Latent Dirichlet Allocation in order to capture the semantic relation holding between adjectives and nouns along interpretable dimensions of meaning: The meaning of adjective-noun phrases is characterized in terms of ontological attributes that are prominent in their compositional semantics. The model is evaluated in a similarity prediction task based on paired adjective-noun phrases from the Mitchell and Lapata (2010) benchmark data.
Having worked with a number of grammatical frameworks over many years at varying depth, I have gained an understanding of their similarities and differences, their respective attractiveness and strengths, but also their biases, which... more
Having worked with a number of grammatical frameworks over many years at varying depth, I have gained an understanding of their similarities and differences, their respective attractiveness and strengths, but also their biases, which relate to the specific architectural choices they make.

And 25 more