Lemmatization Research Papers

In corpus studies when analysing a word, it is often convenient to see all or several grammatical forms of this word as one group. Such a group is referred to as a lemma. Traditionally the lemma is seen as a group of words that share the... more

Bookmark
Download
- by Andrius Utka
- •
- 4
  Corpus Linguistics, Collocations, Lithuanian language, Lemmatization

In this paper, we describe an approach to lemmatisation for Russian nouns, which makes use of a large-scale inheritance lexicon implemented in the lexical representation language DATR (Evans and Gazdar 1996). The lexicon was compiled... more

Bookmark
Download
- by Jesus Vilares
- •
- 5
  Information Retrieval, Spanish, Natural Language Processing, Morphology

Bookmark
Download
- by Jesus Vilares
- •
- 10
  Information Retrieval, Spanish, Morphology, Parsing

This journal provides an overview of Natural Language Processing (NLP). This will provide an introduction to NLP and is also intended to focus on the discussion of the current challenges of Natural Language Processing, NLP libraries, and... more

This paper deals with the impact of complex morphological structures on essential aspects of lexicology. On the basis of data from the Kartvelian (South-Caucasian) language family consisting of Georgian and its sister-languages, it... more

This paper discusses the theoretical bases as well as the pragmatic implementation of the lemmatization of the Late Latin Charter Treebanks (LLCT). LLCT is a set of three dependency treebanks (LLCT1, LLCT2, LLCT3) of Early Medieval Latin... more

This short article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek... more

Bookmark

Bookmark
Download
- by Sylvie Mellet
- •
- 4
  Corpus, Lemmatisation, Lemmatization, Co occurrence

The paper presents a spoken corpus of the endangered Torlak dialect from the Timok area of Southeast Serbia. This dialect expresses a great deal of variation in the use of non-standard features under the influence of standard Serbian... more

Bookmark
Download
- by Teodora Vuković
- •
- 9
  Dialectology, Corpus Linguistics, Spoken corpus, Corpus Annotation

This article is the second in a trilogy that deals with corpus-driven Bantu lexicography, which is illustrated for Lusoga. The focus here is on the macrostructure and in particular on the building of a lemmatised frequency list directly... more

Bookmark
Download
- by Gilles-Maurice de Schryver and +1
  Minah Nabirye
- •
- 13
  Bantu Linguistics, Uganda, Corpus Linguistics, Lexicography

Although the task of semantic textual similarity (STS) has gained in prominence in the last few years, annotated STS datasets for model training and evaluation, particularly those with fine-grained similarity scores, remain scarce for... more

Bookmark
Download
- by Vuk Batanović and +1
  Milos Cvetanovic
- •
- 10
  Natural Language Processing, Semantics, Computational Linguistics, Serbian

The algorithm and the software for conducting the procedure of Preprocessing of the reviews of films in Polish language was developed. This algorithm contains the following steps: Text Adaptation Procedure; Procedure of Tokenization;... more

Bookmark
Download
- by Nina Rizun
- •
- 3
  Stemming, Lemmatization, Text Preprocessing

Zulu uses a conjunctive writing system, that is, a system whereby relatively short linguistic words are joined together to form long orthographic words with complex morphological structures. This has led to the so-called 'stem tradition'... more

Bookmark
Download
- by Jesus Vilares
- •
- 10
  Information Retrieval, Natural Language Processing, Morphology, Parsing

This research focuses on the implementation of Gramatika, a grammar checker designed for the Filipino language given its available resources and linguistic tools. The checker uses hybrid n-grams generated from n-grams of words,... more

Bookmark

Bookmark
- by Kalervo Järvelin
- •
- 6
  Information Retrieval, Clustering, Normalization, Stemming

In this paper we consider a set of natural language processing techniques that can be used to analyze large amounts of texts, focusing on the advanced tokenizer which accounts for a number of complex linguistic phenomena, as well as for... more

Bookmark
- by Ray Siemens
- •
- 5
  Parsing, Text Analysis, Language Culture and Communication, Concordance

Dans le cadre de la collaboration établie par les programmes consolidés des CIFM (Corpus des Inscriptions de la France Médiévale - CESCM Poitiers) et CBMA (Corpus Burgundiae Medii Aevi - LaMOP - Paris), leurs équipes ont constitué un... more

Bookmark
Download
- by Estelle INGRAND-VARENNE and +3
  Eliana Magnani
  Nicolas Perreaux
  Davide Gherdevich
- •
- 7
  Medieval History, Burgundian history, Written Culture, Epigraphy

The focus of this study is Hellenistic Greek, a variation of Greek that continues to be of particular interest within the humanities. The Hellenistic variant of Greek, we argue, requires tools that are specifically tuned to its... more

Bookmark
Download
- by Ryder Wishart
- •
- 3
  Ancient Greek Language, Topic Models, Lemmatization

Bookmark
Download
- by Mario Barcala
- •
- 12
  Information Retrieval, Natural Language Processing, Morphology, Parsing

Bookmark
Download
- by J. Graña
- •
- 10
  Information Retrieval, Spanish, Natural Language Processing, Morphology

Bookmark
Download
- by Kimmo Kettunen
- •
- 11
  Statistical Analysis, Linguistics, Natural language, Noun

Resumen En este art culo se describe el sistema ERIAL, llevado a cabo en el marco del proyecto del mismo nombre, para Recuperaci on de Informaci on. Tras una primera descripci on externa del proyecto (Secci on 1), se presenta el entorno... more

In this research article an in-depth investigation is presented of the lexicographic treatment of the demonstrative copulative (DC) in Sesotho sa Leboa. This one case study serves as an example to illustrate the so-called 'paradigmatic... more

Bookmark
Download
- by Miguel Angel Alonso Pardo and +1
  Jesus Vilares
- •
- 9
  Information Retrieval, Spanish, Natural Language Processing, Morphology

En este artículo se propone la utilización de mecanismos de morfología derivativa productiva con el fin de agrupar en una misma familia morfológica a todas aquellas palabras que se derivan de una misma raíz gramatical. En particular, se... more

One of the most important prior tasks for robust part-ofspeech tagging is the correct tokenization or segmentation of the texts. This task can involve processes which are much more complex than the simple identification of the different... more

We describe an on-going project to develop a lexical database of American Sign Language (ASL) as a tool for annotating ASL corpora collected in the United States. Labs within our team complete locally chosen fields using their notation... more

Bookmark
Download
- by Leah Geer
- •
- 3
  Lemmatization, ID Gloss, ASL corpora

Lemmatization, finding the basic morphological form of a word in a corpus, is an important step in many natural language processing tasks when working with morphologically rich languages. We describe and evaluate Nefnir, a new open source... more

Bookmark
Download
- by Hrafn Loftsson
- •
- 2
  Morphology, Lemmatization

In this article, we consider the problem of supervised morphological analysis using an approach that differs from industry spread analogs. The article describes a new method of lemmatization based on the algorithms of machine learning, in... more

The research group L.A.S.L.A. (Laboratoire d'Analyse Statistique des Langues An-ciennes, University of Liège, Belgium) began in 1961 a project of lemmatization and morphosyntactic tagging of Latin texts. This project continues with new... more

Bookmark
Download
- by Yves Ouvrard and +1
  Margherita Fantoli
- •
- 2
  Latin Language, Lemmatization

We present a pattern-based question answering system for Romanian that participated in the Romanian monolingual task of the QA@CLEF 2007 track. We aim to prove that working with a good Boolean searching engine and using question type... more

Bookmark
Download
- by Dan Stefanescu and +2
  Radu Ion
  Dan Tufis
- •
- 6
  Question Answering System, Search Engine, Tokenization, Question Answering

Bookmark
- by Ronen Feldman
- •
- 52
  Business, Knowledge Management, Architecture, Visualization

We consider a set of natural language processing techniques based on finite-state technology that can be used to analyze huge amounts of texts.These techniques include an advanced tokenizer, a part-of-speech tagger that can manage... more

Bookmark
Download
- by Mario Barcala
- •
- 10
  Information Retrieval, Natural Language Processing, Morphology, Parsing

Bookmark
Download
- by Jesus Vilares
- •
- 5
  Information Retrieval, Spanish, Natural Language Processing, Morphology

Bookmark
Download
- by Jesus Vilares
- •
- 11
  Information Retrieval, Spanish, Natural Language Processing, Morphology

Bookmark
Download
- by Jesus Vilares
- •
- 8
  Spanish, Natural Language Processing, Galician, Named Entity Recognition

Bookmark
Download
- by Jesus Vilares
- •
- 7
  Information Retrieval, Natural Language Processing, Morphology, Stemming

Bookmark
Download
- by Jon Mills
- •
- 8
  Lexicology, Cornish Language, Computational Linguistics, Corpus Linguistics

Lemmatization

Log In