About

Location

Institut orientaliste
Collège Érasme
Place Blaise Pascal, 1
1348 Louvain-la-Neuve
bte L3.03.32
Belgique

Phone Number

(+32 0)10 47 49 10

Bastien Kindt

UCLouvain (University of Louvain), Institut des Civilisations, Arts et Lettres (INCAL), Faculty Member

Other Affiliations:
add
Research Interests:
Greek lexicology, Greek Lexicography, Natural Language Processing, Byzantine Hagiography, Ancient Greek Language, Byzantine Studies, and 20 moreLexicology, Ancient Greek, Kartvelian Languages, Lemmatization, Greek Literature, Lexicography, Greek Language, Late Antiquity, Greek Grammar, Dictionaries, Lexicografía Griega, DIccionarios De Griego Antiguo, Ancient Greek Vocabulary, Liddell Scott Jones, Byzantine Archaeology, Classics, Early Christianity, Roman History, Ancient Greek History, and Roman Archaeologyedit
About:
Scientific collaborator at UCLouvain (Louvain-la-Neuve, Belgium) NLP Resources & Digital Corpus Developer at Peeters ... moreScientific collaborator at UCLouvain (Louvain-la-Neuve, Belgium)
NLP Resources & Digital Corpus Developer at Peeters Publishers (Leuven, Belgium).
I graduated both in Classics and in Oriental Philology and History at the UCLouvain. I am a hellenist with a major interest in the languages of the Christian East, on the one hand, and in the automatic processing of natural languages, on the other hand.
At present, I am the coordinator of the GREgORI project directed by Professor Bernard Coulie at UCLouvain (see https://uclouvain.be/fr/instituts-recherche/incal/ciol/gregori-project.html).
This Project aims to provide researchers with tagged corpora of texts written in classical and oriental languages (mainly Greek, Armenian, Georgian, Arabic, and Syriac). This includes the preparation of lemmatized indexes and concordances and designing online user interfaces and search capabilities for Greek and oriental texts. Samples of tagged corpora are available free of charge on the interfaces of the GREgORI Project (see https://www.v2.gregoriproject.com).
Processed texts are turned into corpora enriched with lexical (lemma), part-of-speech (noun, adjective, verb, etc.), and grammatical tags (case, gender, tense, person, number, etc.). By handling such data with the appropriate query engines, users can search for words or expressions in tagged corpora and gather linguistic materials in order to automatically create indexes, concordances, and other lexicographical tools (frequency indexes, inverse indexes, etc.), paving the way for linguistic, philological, or historical studies.
These developments are carried out in cooperation with scholars of the Oriental Institute of the UCLouvain – for their linguistic expertise –, with Calfa (see https://calfa.fr) – for IT developments –, as well as with other researchers and academic teams, both in Belgium and abroad.
Examples of ongoing projects are the creation of digital versions of the Corpus Scriptorum Christianorum Orientalium series, in collaboration with Peeters Publishers (Leuven, Belgium), and processing of the Syriac texts published by the project "Florilegia Syriaca. The Intercultural Dissemination of Greek Christian Thought in Syriac and Arabic in the First Millennium CE" (Venice) (see https://cordis.europa.eu/project/rcn/212198).edit
Advisors:
edit

Papers

Bibliographie du projet GREgORI 1990-...

Research Interests:
Patristics, Georgian Language, Armenian Language, Byzantine historiography, Ancient Greek Language, and 8 morePOS tagging, Natural Language Processing(NLP), Lemmatization, Syriac Language, Ancient Greek, Concordances, Lexicology, Greek (Byzantine) Texts, Keyword in Context (KWIC), and Recursive Neural network (RNN)

Download (.pdf)

The colophons of Armenian manuscripts constitute a large textual corpus spanning a millennium of written culture. These texts are highly diverse and rich in terms of linguistic variation. This poses a challenge to NLP tools, especially considering the fact that linguistic resources designed or suited for Armenian are still scarce. In this paper, we deal with a sub-corpus of colophons written to commemorate the rescue of a manuscript and dating from 1286 to ca. 1450, a thematic group distinguished by a particularly high concentration of words exhibiting linguistic variation. The text is processed (lemmatization, POS-tagging, and inflectional tagging) using the tools of the GREgORI Project and evaluated. Through a selection of examples, we show how variation is dealt with at each linguistic level (phonology, orthography, flexion, vocabulary, syntax). Complex variation, at the level of tokens or lemmata, is considered as well. The results of this work are used to enrich and refine the linguistic resources of the GREgORI project, which in turn benefits the processing of other texts.

Publication Date: 2022

Research Interests:
Corpus Linguistics, Armenian Language, Part of Speech Tagging, Lemmatization, and Armenian Colophons

Download (.pdf)

The aim of this paper is to evaluate a lexical analysis (mainly lemmatization and POS-tagging) of a sample of the ancient-Armenian version of the Adversus Haereses by Irenaeus of Lyons (2nd c.) by using hybrid approach based on digital dictionaries on the one hand, and on Recurrent Neural Network (RNN) on the other hand. The quality of the results is checked by comparing data obtained by implementing these two methods with data manually checked. In the present case, 98,37% of the results are correct by using the first (lexical) approach, and 74,64% by using the second (RNN). But, in fact, both methods present advantages and disadvantages and argue for the hybrid method. The linguistic resources implemented here are jointly developed and tested by GREgORI and Calfa.

Publication Date: 2022

Research Interests:
Armenian Language, Recurrent Neural Network, Part of Speech Tagging, Lemmatization, and Irénée de Lyon

Download (.pdf)

Creating a digital corpus enriched by full linguistic annotations is a work
which classically integrates several manual steps of acquisition, processing, and data display. Processing presupposes the existence of dedicated and specialised analysis tools, adapted to the state of the language used in the corpus. This paper describes a semi-supervised process for building Armenian corpora from scanned documents.
This method is based on a chain of applications pre-trained by Calfa and GREgORI and enabling the complete processing of texts, from their automated input to their linguistic analysis and data display. We provide an assessment of this methodology and benefits of model specialisation, based on digitised copies of a 17th-century manuscript of the Four Gospels (Walters MS W541 = BAL W541, Amida Gospels, ff. 113v-117r: Lk 1:1‑78).

Publication Date: 2022

Research Interests:
Corpus Linguistics, Armenian Language, Image Processing, OCR, Pattern Recogniton, Part of Speech Tagging, Lemmatization, and 2 moreHTR Technology and Recursive Neural network (RNN)

Download (.pdf)

The DTC corpus brings together historical texts written in Greek during the Byzantine period. These texts were analyzed semi-automatically lemmatization and POS-tagging) by using computer tools and linguistic resources of the GREgORI project (UCLouvain, Louvain-la-Neuve, Belgium) specialized in the NLP of Greek and the languages of the Christian East. A second analysis was carried out in collaboration with the company Calfa (Paris, France) developping NLP tools for Armenian and implementing approach relating to artificial intelligence. This second analysis is performed by a neural network. This study compares and evaluates the results produced by the two methods and proposes a hybrid approach for the processing of the languages concerned.

Publication Date: 2023

Research Interests:
Greek Language, Corpus Linguistics, Recurrent Neural Network, Byzantine Greek, Part of Speech Tagging, and Lemmatization

Download (.pdf)

Lemmatisation automatique des sources en géorgien ancien

Research Interests:
Georgian Language, Corpus Linguistics, Computational Linguistics & NLP, Part of Speech Tagging, and Lemmatization

Download (.pdf)

The GREgORI project provides scholars with lemmatized corpora of texts written in Greek and in the main languages of the Christian East. Attested word-forms are linked with lemma, POS-, and inflectional tags. This work makes it possible to produce lemmatized indexes and concordances, and to disseminate these corpora by using web-based interfaces. This paper gives an overview of the goals of the GREgORI project and lists the morphosyntactic and inflectional tags used for the analysis of the ancient Armenian language.

DOI: 10.2143/MUS.135.1.3290656

Publication Date: 2022

Publication Name: Le Muséon

Research Interests:
Natural Language Processing, Armenian Studies, Computational Linguistics, Morphosyntax, Classical Armenian, and 3 moreArmenian Language, Computational Linguistics & NLP, and Lemmatization

Volume: 7

More Info: https://uclouvain.be/fr/instituts-recherche/incal/ciol/babelao-7-2018.html

Page Numbers: 51-80

Publication Date: 2018

Publication Name: BABELAO

Research Interests:
Greek Language, Corpus Linguistics, Gregory of Nazianzus, Syriac Studies, Corpus Linguistics and Translation Studies, and 4 moreSyriac (Languages And Linguistics), Aramaic/Syriac, Ancient Greek, Concordances, Lexicology, and Concordances

Download (.pdf)

Research Interests:
Lexicology, Natural Language Processing, Gregory of Nazianzus, Isaac of Nineveh, Lemmatization, and Lexical Coverage

Volume: 6

Publication Date: 2018

Publication Name: Journal of Data Mining & Digital Humanities

Research Interests:
Digital Humanities, Corpus Linguistics, and Georgian Studies

Download (.pdf)

More Info: avec Jean-Marie Auwers

Research Interests:
Septuagint

Research Interests:
Lexicology, Ancient Greek, Natural Language Processing

Download (.pdf)

Research Interests:
Hagiopgraphy, History, Archaeology

Books

Les vingt-trois publications reprises dans ce volume sont consacrées à la LXX de Jérémie et à Baruch son secrétaire. Elles visent pour la plupart à montrer que le texte hébreu reçu (TM, texte long) est une refonte d’un modèle hébreu traduit en grec et conservé dans la Septante (LXX, texte court placé sous la responsabilité de Baruch). Pour ce faire, P.-M. Bogaert a mis en œuvre une exégèse «différentielle» qui compare, analyse et explique les divergences entre les deux formes du livre. L’ouvrage commence par des contributions plus générales; les suivantes portent sur des passages choisis en fonction de leur intérêt: différence de contenu et différence d’ordre. Une dernière contribution, inédite, offre une synthèse provisoire visant à caractériser le texte court pour lui-même. Le recueil est introduit par une préface en anglais du Professeur Emanuel Tov, de l’Université hébraïque de Jérusalem (Publisher’s blurb – Peeters Publishers, 2020).

Publication Date: 2020

Research Interests:
Septuagint, Septuaginta Text, Septuagint, LXX, Hebrew Bible/Old Testament, and Septuagint and Peshitta

Research Interests:
Byzantine Studies and Ancient Greek, Concordances, Lexicology

Research Interests:
Ancient Greek, Concordances, Lexicology

Research Interests:
archaeology, onomastic, ancient Kerkyra

Samples of lemmatized concordances

Publication Date: 2018

Research Interests:
Gregory of Nazianzus, Syriac Studies, and Syriac Christianity

Download (.pdf)

Full version available at the following address: https://uclouvain.be/fr/instituts-recherche/incal/ciol/the-concordances-of-the-gregori-project.html

Publication Date: 2018

Research Interests:
Gregory of Nyssa and Ancient Greek, Concordances, Lexicology

Download (.pdf)

Research Interests:
Digital Humanities, Natural Language Processing, and Kartvelian Languages

Download (.pdf)

Bibiography of the GREgORI Project

Bibliographie du projet GREgORI (ordre chronologique)

Research Interests:
Georgian Language, Syriac (Languages And Linguistics), Armenian Language, Ancient Greek Language, POS tagging, and 4 moreNatural Language Processing(NLP), Lemmatization, Ancient Greek, Concordances, Lexicology, and Recursive Neural network (RNN)

Related Authors

Jean-Claude HAELEWYCK

David Carmona Centeno

Emmanuel Van Elverdinghe

Bancila Ionut

Philip Michael Forness

Bastien Kindt

Publication Date: 2022

Research Interests: Corpus Linguistics, Armenian Language, Part of Speech Tagging, Lemmatization, and Armenian Colophons<div>()</div>

Publication Date: 2022

Research Interests: Armenian Language, Recurrent Neural Network, Part of Speech Tagging, Lemmatization, and Irénée de Lyon<div>()</div>

Publication Date: 2022

Publication Date: 2023

Research Interests: Greek Language, Corpus Linguistics, Recurrent Neural Network, Byzantine Greek, Part of Speech Tagging, and Lemmatization<div>()</div>

Research Interests: Georgian Language, Corpus Linguistics, Computational Linguistics & NLP, Part of Speech Tagging, and Lemmatization<div>()</div>

DOI: 10.2143/MUS.135.1.3290656

Publication Date: 2022

Publication Name: Le Muséon

Volume: 7

More Info: https://uclouvain.be/fr/instituts-recherche/incal/ciol/babelao-7-2018.html

Page Numbers: 51-80

Publication Date: 2018

Publication Name: BABELAO

Research Interests: Lexicology, Natural Language Processing, Gregory of Nazianzus, Isaac of Nineveh, Lemmatization, and Lexical Coverage<div>()</div>

Volume: 6

Publication Date: 2018

Publication Name: Journal of Data Mining & Digital Humanities

Research Interests: Digital Humanities, Corpus Linguistics, and Georgian Studies<div>()</div>

More Info: avec Jean-Marie Auwers

Research Interests: Septuagint<div>()</div>

Research Interests: Lexicology, Ancient Greek, Natural Language Processing<div>()</div>

Research Interests: Hagiopgraphy, History, Archaeology<div>()</div>

Publication Date: 2020

Research Interests: Septuagint, Septuaginta Text, Septuagint, LXX, Hebrew Bible/Old Testament, and Septuagint and Peshitta<div>()</div>

Research Interests: Byzantine Studies and Ancient Greek, Concordances, Lexicology<div>()</div>

Research Interests: Ancient Greek, Concordances, Lexicology<div>()</div>

Research Interests: Ancient Greek, Concordances, Lexicology<div>()</div>

Research Interests: Ancient Greek, Concordances, Lexicology<div>()</div>

Research Interests: Ancient Greek, Concordances, Lexicology<div>()</div>

Research Interests: Ancient Greek, Concordances, Lexicology<div>()</div>

Research Interests: Ancient Greek, Concordances, Lexicology<div>()</div>

Research Interests: Ancient Greek, Concordances, Lexicology<div>()</div>

Research Interests: Ancient Greek, Concordances, Lexicology<div>()</div>

Research Interests: Ancient Greek, Concordances, Lexicology<div>()</div>

Research Interests: Ancient Greek, Concordances, Lexicology<div>()</div>

Research Interests: archaeology, onomastic, ancient Kerkyra<div>()</div>

Publication Date: 2018

Research Interests: Gregory of Nazianzus, Syriac Studies, and Syriac Christianity<div>()</div>

Publication Date: 2018

Research Interests: Gregory of Nyssa and Ancient Greek, Concordances, Lexicology<div>()</div>

Research Interests: Digital Humanities, Natural Language Processing, and Kartvelian Languages<div>()</div>

Log In

Research Interests:
Corpus Linguistics, Armenian Language, Part of Speech Tagging, Lemmatization, and Armenian Colophons

Research Interests:
Armenian Language, Recurrent Neural Network, Part of Speech Tagging, Lemmatization, and Irénée de Lyon

Research Interests:
Greek Language, Corpus Linguistics, Recurrent Neural Network, Byzantine Greek, Part of Speech Tagging, and Lemmatization

Research Interests:
Georgian Language, Corpus Linguistics, Computational Linguistics & NLP, Part of Speech Tagging, and Lemmatization

Research Interests:
Lexicology, Natural Language Processing, Gregory of Nazianzus, Isaac of Nineveh, Lemmatization, and Lexical Coverage

Research Interests:
Digital Humanities, Corpus Linguistics, and Georgian Studies

Research Interests:
Septuagint

Research Interests:
Lexicology, Ancient Greek, Natural Language Processing

Research Interests:
Hagiopgraphy, History, Archaeology

Research Interests:
Septuagint, Septuaginta Text, Septuagint, LXX, Hebrew Bible/Old Testament, and Septuagint and Peshitta

Research Interests:
Byzantine Studies and Ancient Greek, Concordances, Lexicology

Research Interests:
Ancient Greek, Concordances, Lexicology

Research Interests:
Ancient Greek, Concordances, Lexicology

Research Interests:
Ancient Greek, Concordances, Lexicology

Research Interests:
Ancient Greek, Concordances, Lexicology

Research Interests:
Ancient Greek, Concordances, Lexicology

Research Interests:
Ancient Greek, Concordances, Lexicology

Research Interests:
Ancient Greek, Concordances, Lexicology

Research Interests:
Ancient Greek, Concordances, Lexicology

Research Interests:
Ancient Greek, Concordances, Lexicology

Research Interests:
Ancient Greek, Concordances, Lexicology

Research Interests:
archaeology, onomastic, ancient Kerkyra

Research Interests:
Gregory of Nazianzus, Syriac Studies, and Syriac Christianity

Research Interests:
Gregory of Nyssa and Ancient Greek, Concordances, Lexicology

Research Interests:
Digital Humanities, Natural Language Processing, and Kartvelian Languages