Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
In corpus studies when analysing a word, it is often convenient to see all or several grammatical forms of this word as one group. Such a group is referred to as a lemma. Traditionally the lemma is seen as a group of words that share the... more
    • by 
    •   4  
      Corpus LinguisticsCollocationsLithuanian languageLemmatization
In this paper, we describe an approach to lemmatisation for Russian nouns, which makes use of a large-scale inheritance lexicon implemented in the lexical representation language DATR (Evans and Gazdar 1996). The lexicon was compiled... more
    • by 
    •   4  
      Languages and LinguisticsRussianComputational LinguisticsLemmatization
    • by 
    •   5  
      Information RetrievalSpanishNatural Language ProcessingMorphology
    • by 
    •   10  
      Information RetrievalSpanishMorphologyParsing
This journal provides an overview of Natural Language Processing (NLP). This will provide an introduction to NLP and is also intended to focus on the discussion of the current challenges of Natural Language Processing, NLP libraries, and... more
    • by 
    •   6  
      Information SystemsStemmingWord CloudsNatural Language Processing(NLP)
This paper deals with the impact of complex morphological structures on essential aspects of lexicology. On the basis of data from the Kartvelian (South-Caucasian) language family consisting of Georgian and its sister-languages, it... more
    • by 
    •   12  
      Georgian LanguageLexicographyLexicography and Corpus StudiesKartvelian Languages
This paper discusses the theoretical bases as well as the pragmatic implementation of the lemmatization of the Late Latin Charter Treebanks (LLCT). LLCT is a set of three dependency treebanks (LLCT1, LLCT2, LLCT3) of Early Medieval Latin... more
    • by 
    •   5  
      Latin LanguageCharters and PaleographyTreebanksEarly Medieval Period
    • by 
    •   20  
      Computer ScienceInformation RetrievalLanguages and LinguisticsNatural Language Processing
This short article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek... more
    • by 
    •   7  
      Computational LinguisticsCorpus LinguisticsGreek LinguisticsAncient Greek Language
    • by 
    •   4  
      CorpusLemmatisationLemmatizationCo occurrence
The paper presents a spoken corpus of the endangered Torlak dialect from the Timok area of Southeast Serbia. This dialect expresses a great deal of variation in the use of non-standard features under the influence of standard Serbian... more
    • by 
    •   9  
      DialectologyCorpus LinguisticsSpoken corpusCorpus Annotation
This article is the second in a trilogy that deals with corpus-driven Bantu lexicography, which is illustrated for Lusoga. The focus here is on the macrostructure and in particular on the building of a lemmatised frequency list directly... more
    • by  and +1
    •   13  
      Bantu LinguisticsUgandaCorpus LinguisticsLexicography
Although the task of semantic textual similarity (STS) has gained in prominence in the last few years, annotated STS datasets for model training and evaluation, particularly those with fine-grained similarity scores, remain scarce for... more
    • by  and +1
    •   10  
      Natural Language ProcessingSemanticsComputational LinguisticsSerbian
The algorithm and the software for conducting the procedure of Preprocessing of the reviews of films in Polish language was developed. This algorithm contains the following steps: Text Adaptation Procedure; Procedure of Tokenization;... more
    • by 
    •   3  
      StemmingLemmatizationText Preprocessing
Zulu uses a conjunctive writing system, that is, a system whereby relatively short linguistic words are joined together to form long orthographic words with complex morphological structures. This has led to the so-called 'stem tradition'... more
    • by 
    •   8  
      Bantu LinguisticsLexicographyLexicography and Corpus StudiesDictionary
    • by 
    •   10  
      Information RetrievalNatural Language ProcessingMorphologyParsing
This research focuses on the implementation of Gramatika, a grammar checker designed for the Filipino language given its available resources and linguistic tools. The checker uses hybrid n-grams generated from n-grams of words,... more
    • by 
    •   17  
      Computer ScienceLanguages and LinguisticsNatural Language ProcessingMorphology
    • by 
    •   9  
      Information RetrievalSpanishNatural Language ProcessingComputational Linguistics
    • by 
    •   6  
      Information RetrievalClusteringNormalizationStemming
In this paper we consider a set of natural language processing techniques that can be used to analyze large amounts of texts, focusing on the advanced tokenizer which accounts for a number of complex linguistic phenomena, as well as for... more
    • by 
    •   11  
      Computer ScienceInformation RetrievalNatural Language ProcessingCorpus Linguistics
In this paper we consider a set of natural language processing techniques that can be used to analyze large amounts of texts, focusing on the advanced tokenizer which accounts for a number of complex linguistic phenomena, as well as for... more
    • by 
    •   11  
      Computer ScienceInformation RetrievalNatural Language ProcessingCorpus Linguistics
    • by 
    •   5  
      ParsingText AnalysisLanguage Culture and CommunicationConcordance
Dans le cadre de la collaboration établie par les programmes consolidés des CIFM (Corpus des Inscriptions de la France Médiévale - CESCM Poitiers) et CBMA (Corpus Burgundiae Medii Aevi - LaMOP - Paris), leurs équipes ont constitué un... more
    • by  and +3
    •   7  
      Medieval HistoryBurgundian historyWritten CultureEpigraphy
The focus of this study is Hellenistic Greek, a variation of Greek that continues to be of particular interest within the humanities. The Hellenistic variant of Greek, we argue, requires tools that are specifically tuned to its... more
    • by 
    •   3  
      Ancient Greek LanguageTopic ModelsLemmatization
    • by 
    •   12  
      Information RetrievalNatural Language ProcessingMorphologyParsing
    • by 
    •   10  
      Information RetrievalSpanishNatural Language ProcessingMorphology
    • by 
    •   11  
      Statistical AnalysisLinguisticsNatural languageNoun
Resumen En este art culo se describe el sistema ERIAL, llevado a cabo en el marco del proyecto del mismo nombre, para Recuperaci on de Informaci on. Tras una primera descripci on externa del proyecto (Secci on 1), se presenta el entorno... more
    • by  and +3
    •   10  
      Information RetrievalSpanishNatural Language ProcessingMorphology
In this research article an in-depth investigation is presented of the lexicographic treatment of the demonstrative copulative (DC) in Sesotho sa Leboa. This one case study serves as an example to illustrate the so-called 'paradigmatic... more
    • by 
    •   4  
      LexicographyLexicography and Corpus StudiesLemmatizationDemonstrative Copulative
    • by  and +1
    •   9  
      Information RetrievalSpanishNatural Language ProcessingMorphology
En este artículo se propone la utilización de mecanismos de morfología derivativa productiva con el fin de agrupar en una misma familia morfológica a todas aquellas palabras que se derivan de una misma raíz gramatical. En particular, se... more
    • by 
    •   9  
      Computer ScienceInformation RetrievalSpanishNatural Language Processing
    • by 
    •   10  
      Computer ScienceInformation RetrievalInformation TechnologyNatural Language Processing
One of the most important prior tasks for robust part-ofspeech tagging is the correct tokenization or segmentation of the texts. This task can involve processes which are much more complex than the simple identification of the different... more
    • by 
    •   9  
      SpanishNatural Language ProcessingGalicianNamed Entity Recognition
We describe an on-going project to develop a lexical database of American Sign Language (ASL) as a tool for annotating ASL corpora collected in the United States. Labs within our team complete locally chosen fields using their notation... more
    • by 
    •   3  
      LemmatizationID GlossASL corpora
    • by 
    •   5  
      LinguisticsLexicographyLexicography and Corpus StudiesLemmatization
Lemmatization, finding the basic morphological form of a word in a corpus, is an important step in many natural language processing tasks when working with morphologically rich languages. We describe and evaluate Nefnir, a new open source... more
    • by 
    •   2  
      MorphologyLemmatization
In this article, we consider the problem of supervised morphological analysis using an approach that differs from industry spread analogs. The article describes a new method of lemmatization based on the algorithms of machine learning, in... more
    • by 
    •   5  
      Machine LearningData MiningComputational LinguisticsNatural Language Processing(NLP)
The research group L.A.S.L.A. (Laboratoire d'Analyse Statistique des Langues An-ciennes, University of Liège, Belgium) began in 1961 a project of lemmatization and morphosyntactic tagging of Latin texts. This project continues with new... more
    • by  and +1
    •   2  
      Latin LanguageLemmatization
We present a pattern-based question answering system for Romanian that participated in the Romanian monolingual task of the QA@CLEF 2007 track. We aim to prove that working with a good Boolean searching engine and using question type... more
    • by  and +2
    •   6  
      Question Answering SystemSearch EngineTokenizationQuestion Answering
    • by 
    •   52  
      BusinessKnowledge ManagementArchitectureVisualization
We consider a set of natural language processing techniques based on finite-state technology that can be used to analyze huge amounts of texts.These techniques include an advanced tokenizer, a part-of-speech tagger that can manage... more
    • by 
    •   10  
      Information RetrievalNatural Language ProcessingMorphologyParsing
    • by 
    •   22  
      Computer Assisted Language LearningNatural Language ProcessingMachine LearningComputational Modeling
    • by 
    •   10  
      Information RetrievalSpanishNatural Language ProcessingMorphology
    • by 
    •   5  
      Information RetrievalSpanishNatural Language ProcessingMorphology
    • by 
    •   11  
      Information RetrievalSpanishNatural Language ProcessingMorphology
    • by 
    •   8  
      SpanishNatural Language ProcessingGalicianNamed Entity Recognition
    • by 
    •   7  
      Information RetrievalNatural Language ProcessingMorphologyStemming
    • by 
    •   8  
      LexicologyCornish LanguageComputational LinguisticsCorpus Linguistics