Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Jose Mari Arriola

    Jose Mari Arriola

    In this article, we explain the first steps towards a grammar-helping tool for Basque from a ruled-based approach. Specifically, we show the first steps carried out for helping with verb agreement, some of the difficulties encountered,... more
    In this article, we explain the first steps towards a grammar-helping tool for Basque from a ruled-based approach. Specifically, we show the first steps carried out for helping with verb agreement, some of the difficulties encountered, which linguistic issues arise when new rules are designed, and future perspectives.
    In this report we present the tags we use when annotating the gold standard of syntactic functions and the decisions taken during its annotation. The gold standard is a necessary resource to evaluate the rule-based surface syntactic... more
    In this report we present the tags we use when annotating the gold standard of syntactic functions and the decisions taken during its annotation. The gold standard is a necessary resource to evaluate the rule-based surface syntactic parser (the one based on the Constraint Grammar formalism), and, moreover, it can be useful to develop and evaluate statistical parsers. The tags we are presenting here follow the Constraint Grammar (CG) formalism (Karlsson et al., 1995). In fact, last experiments show that good results have been obtained when parsing with CG (Karlsson et al., 1995; Samuelsson and Voutilainen, 1997; Tapanainen and Järvinen, 1997; Bick, 2000).
    En este articulo presentamos el proceso de construccion de SF-EPEC, un corpus de 300.000 palabras, sintacticamente anotado, que pretende ser un Gold Standard para el pocesamiento sintactico superficial del euskera. En primer lugar,... more
    En este articulo presentamos el proceso de construccion de SF-EPEC, un corpus de 300.000 palabras, sintacticamente anotado, que pretende ser un Gold Standard para el pocesamiento sintactico superficial del euskera. En primer lugar, describimos el conjunto de etiquetas disenado para este proposito; siendo el euskera una lengua aglutinante, en ocasiones hemos tenido que crear etiquetas sintacticas compuestas. Asimismo, se detallan las distintas fases en la construccion de SF-EPEC.
    Automatic sense disambiguation: how to tell a pine cone from an ice cream cone. • Compute the upper bound of this method using WordNet. How correct this methodology can be? That is, words belonging to the same narrow context in SemCor can... more
    Automatic sense disambiguation: how to tell a pine cone from an ice cream cone. • Compute the upper bound of this method using WordNet. How correct this methodology can be? That is, words belonging to the same narrow context in SemCor can represent distant correct concepts in WordNet (having other incorrect ones closer). 7 Conclusion The automatic method for the disambiguation of nouns presented in this paper is ready to use in any general domain, free-running text, given part of speech tags. It does not need any training and uses word sense tags from WordNet, a widely used lexical data base. The algorithm is theoretically motivated, and offers a general measure of the semantic relatedness for any number of nouns. Conceptual Density has been used for other tasks apart from the disambiguation of free-running test. Its application for automatic spelling correction is outlined in [Agirre et al. 94]. It was also used on Computational Lexicography, enriching dictionary senses with semant...
    Research Interests:
    We present the Dependency Parser, called Maxuxta, for the linguistic processing of Basque, which can serve as a representative of agglutinative languages that are also characterized by the free order of it s constituents. The Dependency... more
    We present the Dependency Parser, called Maxuxta, for the linguistic processing of Basque, which can serve as a representative of agglutinative languages that are also characterized by the free order of it s constituents. The Dependency syntactic model is applied to establish the dependency-based grammatical relations between the components within the clause. Such a deep analysis is used to improve
    Research Interests:
    This is the stand-off GrAF version of the Basque Dependency Treebank (BDT). It is the Reference Corpus for the Processing of Basque (EPEC) annotated at syntactic level. EPEC is a 300,000 word corpus of standard written journal texts which... more
    This is the stand-off GrAF version of the Basque Dependency Treebank (BDT). It is the Reference Corpus for the Processing of Basque (EPEC) annotated at syntactic level. EPEC is a 300,000 word corpus of standard written journal texts which aims to be a training corpus for the development and inprovement of several Natural Language Procesing tools. It has been manually tagged at different levels: morphology, partial syntax and semantic This is the stand-off GrAF version of the Constituent Basque Treebank.
    This work presents the development and implementation of a full morphological analyzer for Basque, an agglutinative language. Several problems (phrase structure inside word-forms, noun ellipsis, multiplicity of values for the same feature... more
    This work presents the development and implementation of a full morphological analyzer for Basque, an agglutinative language. Several problems (phrase structure inside word-forms, noun ellipsis, multiplicity of values for the same feature and the use of complex linguistic representations) have forced us to go beyond the morphological segmentation of words, and to include an extra module that performs a full morphosyntactic parsing of each word-form. A unification-based word-level grammar has been defined for that purpose. ...
    In this paper we present the tools (lemmatizer-tagger, morphological analyzer, linguistic environment), applications (spelling checker) and resources (lexical database, corpora and grammars) developed for the treatment of Basque by the... more
    In this paper we present the tools (lemmatizer-tagger, morphological analyzer, linguistic environment), applications (spelling checker) and resources (lexical database, corpora and grammars) developed for the treatment of Basque by the IXA Group. Besides, the main research lines and the strategy followed by the group are presented. 1 Introduction IXA Group is composed by 18 members from the Computer Science Faculty of the University of the Basque Country and by 2 members of UZEI (Basque Centre for Terminology and ...
    This paper intends to be an initial proposal to promote research and development in language independent tools. The definition of a basic HLT toolkit is vital to allow the development of lesser-used languages. Which kind of public HLT... more
    This paper intends to be an initial proposal to promote research and development in language independent tools. The definition of a basic HLT toolkit is vital to allow the development of lesser-used languages. Which kind of public HLT products could be integrated, at the moment, in a basic toolkit portable for any language? We try to answer this question by examining the fifty items registered in the Natural Language Software Registry as language independent tools. We propose a toolkit having standard representation of ...
    Résumé/Abstract EDBL est un base de données lexicale pour la langue basque. Cet article présente la conception et les caractéristiques principales de cette base de données, imaginée comme une base lexicographique générale pour le... more
    Résumé/Abstract EDBL est un base de données lexicale pour la langue basque. Cet article présente la conception et les caractéristiques principales de cette base de données, imaginée comme une base lexicographique générale pour le traitement automatique du basque. Le schéma conceptuel de EDBL est expliqué au moyen du diagramme E/A étendu et de structures de traits. L'implémentation de la base en tant que SGBD relationnel commercialisable et les problèmes rencontrés lors de cette implémentation sont discutés
    This paper describes the methodology we have adopted to ensure the quality of the Basque WordNet in terms of coverage, correctness, completeness and adequacy. The Basque WordNet follows the EuroWordNet framework and, basically, it is... more
    This paper describes the methodology we have adopted to ensure the quality of the Basque WordNet in terms of coverage, correctness, completeness and adequacy. The Basque WordNet follows the EuroWordNet framework and, basically, it is produced using a semi-automatic method that links Basque words to the English WordNet. We have found that in order to ensure proper linguistic quality and avoid excessive English bias, a double manual pass on the automatically produced Basque synsets is desirable: a first concept-to ...
    In this paper we describe some experiments based on a previous Constraint Grammar (CG-2) of Basque Complex Postpositions. We present the development and the evaluation of the rewriten CG-2 and the new CG-3 grammars for processing Basque... more
    In this paper we describe some experiments based on a previous Constraint Grammar (CG-2) of Basque Complex Postpositions. We present the development and the evaluation of the rewriten CG-2 and the new CG-3 grammars for processing Basque Complex Postpositions.
    Research Interests:
    Abstract: This paper reports on work in progress to improve shallow parsing for Basque. The practical goal of our work is to enrich the information of the shallow parser with linguistic information for analyzing sequences containing an N... more
    Abstract: This paper reports on work in progress to improve shallow parsing for Basque. The practical goal of our work is to enrich the information of the shallow parser with linguistic information for analyzing sequences containing an N that instantiates a kind of quantification of the ...
    El objetivo del trabajo consiste en reutilizar el Treebank de dependencias EPEC-DEP (BDT) para construir el gold standard de la sintaxis superficial del euskera. El paso basico consiste en el estudio comparativo de los dos formalismos... more
    El objetivo del trabajo consiste en reutilizar el Treebank de dependencias EPEC-DEP (BDT) para construir el gold standard de la sintaxis superficial del euskera. El paso basico consiste en el estudio comparativo de los dos formalismos aplicados sobre el mismo corpus: el formalismo de la Gramatica de Restricciones (Constraint Grammar, CG) y la Gramatica de Dependencias (Dependency Grammar, DP). Como resultado de dicho estudio hemos establecido los criterios linguisticos necesarios para derivar la funciones sintacticas en estilo CG. Dichos criterios han sido implementados y evaluados, asi en el 75% de los casos somos capaces de derivar automaticamente las funciones sintacticas para construir el gold standard.