Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
With increasingly higher numbers of non-English language web searchers the problems of efficient handling of non-English Web documents and user queries are becoming major issues for search engines. The main aim of this review paper is to... more
    • by 
    •   20  
      Information SystemsInformation RetrievalEnglish languageWeb Mining
La production de corpus d'occitan médiéval et prémoderne: problèmes et perspectives de travail Ce dépôt contient les fichiers et modèles décrits dans: Jean-Baptiste Camps et Gilles Guilhem Couffignal, « La production de corpus... more
    • by 
    •   9  
      Digital HumanitiesRomance philologyCorpus LinguisticsMedieval Occitan Literature
En este artículo presentamos una serie de técnicas de Procesamiento de Lenguaje Natural aplicadas a la normalización de términos en Recuperación de Información Textual. El objetivo de dichas técnicas es el tratamiento de los fenómenos de... more
    • by 
    •   10  
      Computer ScienceInformation RetrievalInformation TechnologyNatural Language Processing
Автоматические методы морфологического анализа и лемматизации, предназначенные для литературного русского языка, могут давать невысокие результаты, будучи применёнными к так называемым социальным медиа (микроблоги, социальные сети и... more
    • by  and +1
    •   6  
      Social MediaCorrectionsSpelling ErrorsLevenshtein distance
This research focuses on the implementation of Gramatika, a grammar checker designed for the Filipino language given its available resources and linguistic tools. The checker uses hybrid n-grams generated from n-grams of words,... more
    • by 
    •   16  
      Languages and LinguisticsNatural Language ProcessingMorphologyFilipino
This project sets out to discover and develop techniques for the lemmatisation of a historical corpus of the Cornish language in order that a lemmatised dictionary macrostructure can be generated from the corpus. The system should be... more
    • by 
    •   8  
      LexicologyCornish LanguageComputational LinguisticsLexicography
Aplikasi pemeriksa ejaan (spelling checker) merupakan sebuah tool yang dapat mendeteksi kesalahan penulisan ejaan pada suatu kata atau teks. Aplikasi pemeriksa ejaan untuk bahasa Indonesia pada umumnya memeriksa dengan cara membandingkan... more
    • by 
    •   5  
      Spelling ErrorsMorphological AnalysisTypographical ErrorsBahasa Indonesia
    • by 
    •   10  
      VocabularyLexicographyDictionaryLemma
The paper presents the system used in the EvaLatin shared task to POS tag and lemmatize Latin. It consists of two components. A gradient boosting machine (LightGBM) is used for POS tagging, mainly fed with pre-computed word embeddings of... more
    • by 
    •   4  
      Machine LearningLatin LanguagePOS taggingLemmatization
This paper presents a newly funded international project for machine translation and automated analysis of ancient cuneiform languages where NLP specialists and Assyriologists collaborate to create an information retrieval system for... more
    • by  and +1
    •   26  
      Information RetrievalLanguages and LinguisticsNatural Language ProcessingAssyriology
Processing of Arabic language is very improtant and actual these days. The arabic is the sixth most used language in the word. The problem of stemming is very important in information retrival, knowledge mining language processing. The... more
    • by  and +1
    •   7  
      Arabic morphologyInformation RetrivalArabic Natural Language ProcessingLemmatization
Traditionally, Zulu adjectives have been lemmatized under their stems only. In this research article, an in-depth analysis is undertaken to make a case for the lemmatization of all frequent adjectival forms with their adjective concords... more
    • by 
    •   4  
      LexicographyZuluAdjectivesLemmatization
    • by 
    •   13  
      Digital HumanitiesDigital EditionDigital EditingInformatica umanistica
The authors of this article firmly believe in the advantages of utilising a corpus for lemma-sign list creation. However, one should not overreact and assume that alternative methods for the creation of a dictionary’s macrostructure have... more
    • by 
    •   4  
      LexicographyLexicography and Corpus StudiesNorthern SothoLemmatization
SUMMARY This study examines L2 French learner corpora for adjective complexity through the data in The Newcastle Corpus (Myles & Mitchell, 2016). Data suggest that over the span of one year student usage patterns become more native-like... more
    • by 
    •   32  
      LanguagesSecond Language AcquisitionLanguages and LinguisticsHistorical Linguistics
(1) Introduction (2) Corpora and the compilation of the lemma-sign list (3) Corpora and the battle against inconsistencies (4) Corpora as an aid for conjunctively written languages (5) Corpora as the key to writing better dictionary... more
    • by 
    •   7  
      SwahiliLexicographyLexicography and Corpus StudiesZulu
    • by 
    •   20  
      Information RetrievalDigital HumanitiesLanguages and LinguisticsNatural Language Processing
This paper deals with the impact of complex morphological structures on essential aspects of lexicology. On the basis of data from the Kartvelian (South-Caucasian) language family consisting of Georgian and its... more
    • by 
    •   10  
      Georgian LanguageLexicographyKartvelian LanguagesSvan language
The aim of this article is (a) to reflect on the contributions made by P.S. Groenewald to the field of lexicography in South Africa, focusing on the importance of determining the relative frequency of individual words in Sesotho sa Leboa,... more
    • by 
    •   4  
      LexicographyLexicography and Corpus StudiesNorthern SothoLemmatization
In Zulu, there are three kinds of quantitatives: inclusive, exclusive and numeral. For the lemmatization of these, even existing traditional dictionaries felt the need to move away from a pure 'stem' approach towards a 'word' approach. In... more
    • by 
    •   4  
      LexicographyZuluLemmatizationQuantitative Pronouns
This short article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek... more
    • by 
    •   7  
      Computational LinguisticsCorpus LinguisticsGreek LinguisticsAncient Greek Language
This paper deals with the impact of complex morphological structures on essential aspects of lexicology. On the basis of data from the Kartvelian (South-Caucasian) language family consisting of Georgian and its sister-languages, it... more
    • by 
    •   12  
      Georgian LanguageLexicographyLexicography and Corpus StudiesKartvelian Languages
This paper explores the problem of developing NLP tools for morphologically rich and orthographi-cally inconsistent classical languages. It is a case study of building a lemmatizer for Old Irish using only a dictionary and an unlabeled... more
    • by 
    •   4  
      Natural Language ProcessingComputational LinguisticsOld Irish Language and LiteratureLemmatization
Cet aide-mémoire résume l'essentiel du travail effectué lors de la journée CeLiSo (EA7332) « Créer soi-même un corpus étiqueté d'un million de mots en anglais, français ou allemand » à l'Université Paris IV-Sorbonne le 25/03/2016. Il... more
    • by 
    •   6  
      Corpus LinguisticsPart of Speech TaggingLemmatizationLinguistique De Corpus
[FR - English below] À l’heure où la quantité de données disponibles, plus ou moins librement, s’accroît de manière importante, grâce aux corpus, éditions ou bibliothèques numériques, le développement d’outils de fouille de données ou de... more
    • by 
    •   8  
      Machine LearningRomance philologyMedieval Occitan LiteratureOccitan Language
Lemmatization is a process of finding the base morphological form (lemma) of a word. It is an important step in many natural language processing, information retrieval, and information extraction tasks, among others. We present an... more
    • by 
    •   6  
      Decision TreesDecision Tree ClassificationNatural Language Processing(NLP)Lemmatization
Sentiment classification of texts written in Serbian is still an under-researched topic. One of the open issues is how the different forms of morphological normalization affect the performances of different sentiment classifiers and which... more
    • by 
    •   15  
      Natural Language ProcessingMachine LearningComputational LinguisticsClassification (Machine Learning)
— Stemming is the main step used for handling the morphologically rich languages such as Arabic. It is usually used in several types of applications such as natural language processing, information retrieval, and text mining. The goal of... more
    • by  and +2
    •   4  
      Teaching Arabic as a Foreign Language (TAFL)Corpus LinguisticsPart of Speech TaggingLemmatization
(1) Introduction (2) Lemma-sign lists in dictionaries for the elementary level (3) Compiling the lemma-sign list of the Junior Dictionary (4) Comparison of the compiled lemma-sign list with the manually excerpted vocabulary (5) The... more
    • by 
    •   3  
      LexicographyDomain Specific LanguagesLemmatization
REVIEW. Fragments, (Studies in Iconology, 14), Leuven-Walpole, 2018 (Fragments is co-edited by Stephanie Heremans and is a limited edition and celebratory publication signed by the author). This 400 pages book, consists of 110 lemmata,... more
    • by 
    •   4  
      Aby WarburgEncyclopedismFragments and AphorismsLemmatization
    • by  and +1
    •   5  
      Digital HumanitiesAristotleAncient Greek PhilosophyAristotélisme
The aim of this article is to analyze traditional approaches to the lemmatization of nouns on the macrostructural level in African languages against the background of the user- perspective, the physical limitations on volume, the... more
    • by 
    •   7  
      LexicographyLexicography and Corpus StudiesBantu languagesNouns
We present AcTo, a network of integrated projects for the development of language resources and tools for Medieval Occitan. This abstract illustrates the resources in the network, as well as the first steps towards their integration,... more
    • by  and +2
    •   7  
      Natural Language ProcessingCorpus LinguisticsLexicographyMedieval Occitan Literature
This paper introduces the main components of the downloadable package of the 3.0 version of the morphological analyser for Latin Lemlat. The processes of word form analysis and treatment of spelling variation performed by the tool are... more
    • by  and +2
    •   8  
      Historical LinguisticsComputational LinguisticsMorphologyComputational Linguistics & NLP
We present a new mixed method lemmatizer for Icelandic, Lemmald, which achieves good performance by relying on IceTagger [1] for tagging and The Icelandic Frequency Dictionary [2] corpus for training. We combine the advantages of... more
    • by 
    •   8  
      Natural Language ProcessingMachine LearningComputational LinguisticsNormalization
We describe and compare two tools for processing Middle Russian texts. Both tools provide lemmatization, part-of-speech and morphological annotation. One (“RNC”) was developed for annotating texts in the Russian National Corpus and is... more
    • by 
    •   4  
      Corpus LinguisticsTreebanksPOS taggingLemmatization
(1) Defining a dictionary’s macrostructure (2) On the need for rulers, part 1 (3) Part-of-Speech rulers (‘POS Rulers’) (4) On the need for rulers, part 2 (5) Multidimensional lexicographic rulers (6) Characterising POS Rulers and... more
    • by 
    •   3  
      LexicographyLexicography and Corpus StudiesLemmatization
In this article a four-step methodology is proposed for the creation of the lemma-sign list of a Nguni-language reference work. The theoretical principles are illustrated throughout with a full-scale case study revolving around... more
    • by 
    •   3  
      LexicographyIsiNdebeleLemmatization
An open issue in the sentiment classification of texts written in Serbian is the effect of different forms of morphological normalization and the usefulness of leveraging large amounts of unlabeled texts. In this paper, we assess the... more
    • by 
    •   18  
      Natural Language ProcessingSemanticsComputational LinguisticsClassification (Machine Learning)
    • by 
    •   6  
      Question Answering SystemSearch EngineTokenizationQuestion Answering
We describe an on-going project to develop a lexical database of American Sign Language (ASL) as a tool for annotating ASL corpora collected in the United States. Labs within our team complete locally chosen fields using their notation... more
    • by  and +1
    •   3  
      LemmatizationID GlossASL corpora
Resumen En este art culo se describe el sistema ERIAL, llevado a cabo en el marco del proyecto del mismo nombre, para Recuperaci on de Informaci on. Tras una primera descripci on externa del proyecto (Secci on 1), se presenta el entorno... more
    • by 
    •   12  
      Information RetrievalPhilosophySpanishNatural Language Processing
In this article, we consider the problem of supervised morphological analysis using an approach that differs from industry spread analogs. The article describes a new method of lemmatization based on the algorithms of machine learning, in... more
    • by 
    •   7  
      Computer ScienceMachine LearningData MiningComputational Linguistics
Présentation du lemmatiseur/annotateur Pandora lors de l'atelier du groupe Lemmes du Consortium Sources Médiévales (COSME), org. par Eliana Magnani; Paris, Institut de recherche et d'histoire des textes, 6 novembre 2017.
    • by  and +1
    •   8  
      Natural Language ProcessingOld FrenchMedieval LatinArtificial Neural Networks
    • by  and +1
    •   4  
      PhonologySyllableProcesamiento del Lenguaje NaturalLemmatization
    • by 
    •   4  
      StemmingPOS taggingNatural Language ParsingLemmatization
The research group L.A.S.L.A. (Laboratoire d’Analyse Statistique des Langues Anciennes, University of Liege, Belgium) began in 1961 a project of lemmatization and morphosyntactic tagging of Latin texts. This project continues with new... more
    • by 
    •   3  
      Computer ScienceLatin LanguageLemmatization
c○Springer-Verlag Abstract. In this our first participation in CLEF, we have applied Natural Language Processing techniques for single word and multiword term conflation. We have tested several approaches at different levels of text... more
    • by 
    •   10  
      Information RetrievalSpanishNatural Language ProcessingMorphology