Papers by Chiara Palladino
The identification and classification of names in ancient texts, especially Classical ones like H... more The identification and classification of names in ancient texts, especially Classical ones like Homer's Iliad, is an essential task to support further text processing, but also simple reading facilitation and the creation of reading environments [BC10]. Typically, readers of Ancient Greek or Syriac face the challenge of a complex and sometimes obscure language, but also of very distant cultural references, which goes back to events and traditions that may not be immediately comprehensible to a nonspecialist. Names are a substantial part of this challenge. Texts like the Bible or the Iliad contain hundreds, if not thousands, of names, many of which appear only once or twice, and even language specialists are sometimes unable to immediately recognize certain places, people, or events. An automatic Named Entity Recognition (NER) model for this category of texts can enormously facilitate the task of a reader, by extracting, classifying and linking ancient names to available resources [Kem21], and by using those names to design applications that encourage different approaches to textual and historical exploration [BT]. Nevertheless, in the domain of ancient languages, NER is a very complicated task. The lack of an adequate infrastructure and of annotated data is the main obstacle towards the development of reliable NER pipelines, as many of these languages have scarce (or none at all) services for annotation, lemmatization, morphosyntactic analysis, and named entity classification. Many resources of this kind have been developed, for example, for Ancient Greek and Latin [Bur19], but the lack of annotated texts in the original languages still makes Named Entity Recognition a challenge for many types of texts [EBJ + 16]. There is ample literature on the topic of NER in ancient languages, particularly Ancient Greek and Latin. [BC10] and [BBC + 07] emphasize the importance of processing historical texts for names in the broader field of digital infrastructures for reading and annotation, such as the Perseus Digital Library 1 and the Scaife Viewer 2. The Classical Language Toolkit (CLTK) is the largest Python library to perform NLP tasks on ancient languages, including NER [JBS + 21]: the lack of adequately annotated datasets for most corpora, however, is a fundamental hindrance to the high performance of this task [PKM20]. Other efforts have been made starting from large annotated datasets of specific sources, using semantic annotation platforms and Machine Learning [Ber19]. This paper presents our work on training two automatic NER models for ancient Greek using transformer-based models. The models classify the entities into three categories, namely, Person, Location, and Miscellaneous and achieved promising results on test and evaluation datasets. The models are available on Hugging Face 3 .
Landscape Ecology, 2023
Context Consideration of historical maps for ecological research requires a bidirectional underst... more Context Consideration of historical maps for ecological research requires a bidirectional understanding of human-nature relationships. We investigated shifting environmental values, as they emerge from historical maps of the American Southeast through the eighteenth century, with an interdisciplinary approach combining epistemologies from ecology, anthropology, and history. Objectives Our objectives were to investigate the diversity of different land use annotations in map notations, contextualize these notations within existing knowledge of how power relations shape environmental attitudes and values, and interpret what their representations suggest about human attitudes toward the land. Methods We selected 14 maps created between 1711 and 1773. We georeferenced the corpus in ArcGIS Pro. We identified emergent map themes with MAXQDA. We investigated the resulting patterns within and across the corpus in light of the context and purpose of the maps in R. Results Maps reveal the values held by the colonizers towards various aspects of the land. Natural features are emphasized for their perceived value in direct use. Shifts in the balance between managed/ built land and natural land indicate changes in the colonization agenda, going from depicting a wellequipped infrastructure for initial settlers, to emphasizing the potential for resource extraction. Conclusions Our study shows how the perception of environmental values changes depending on time period, power structures, and agenda. An interdisciplinary approach, which considers the social aspects alongside the ecological ones, can provide a holistic understanding of these dynamics and help us better understand how humans perceive their presence and role in the environment.
Digital Scholarship in the Humanities, 2023
This article presents a study of several parallel corpora of historical languages and their trans... more This article presents a study of several parallel corpora of historical languages and their translations. The aligned corpora are the result of a large crowdsourcing project, named Ugarit, aimed at supporting translation alignment for ancient and historical languages: the study of the resulting translation pairs allows us to observe cross-linguistic dynamics in a range of languages, some of which have never been systematically aligned before. The corpora considered are divided into two distinct groups: English translations of ancient languages, including Greek, Latin, Persian, and Coptic; and translations of ancient Greek into other languages, including Latin, English, Georgian, Italian, and Persian. We evaluated different ratios of word matching across each language pair (one-toone, one-to-many, many-to-one, and many-to-many), and analyzed the resulting trends across the corpus. We propose some observations on how and why different types of alignment links are established in a given language pair, and what factors affect their creation beyond the control of the user: we propose two complementary hypotheses to explain the changes, one based on structural linguistic factors and the other based on cultural difference.
Journal of Computational Literary Studies 1.1, 2022
This paper presents a workflow to systematically compare translations of Ancient Greek into Engli... more This paper presents a workflow to systematically compare translations of Ancient Greek into English and Persian through the analysis of parallel corpora aligned manually at word level. We extracted the translation pairs, measured word intersections, alignment types, and part of speech matches, in order to investigate quantitative indicators of closeness to the original and similarity across translations. The corpus includes passages from the Iliad and the Hippolytus by Euripides. In addition to direct translations, we have included some indirect translations of the Iliad in Persian, where French was used as a mediating language.
Information 13.2:65, 2022
UGARIT is a public web-based tool for manual annotation of parallel texts for generating word-lev... more UGARIT is a public web-based tool for manual annotation of parallel texts for generating word-level translation alignment. We aimed to develop a user-friendly interactive interface to visu- alize aligned texts and collect training data in the form of translation pairs to be used later, (i) for training an automatic translation alignment system for historical languages at the word/phrase level, (ii) as a gold standard to evaluate automatic alignment and machine translation systems. UGARIT is now widely used for learning new languages, especially historical languages, and as a reading environment for parallel texts. In the following sections, we present the related works and similar projects; then, we give an overview of the visualization techniques used to present the alignment results. Further, we explain how we could derive the translation graph from the aligned transla- tion pairs. Finally, we discuss the usage limitations of UGARIT, possible improvements, and future development plans.
IJHAC: International Journal of Humanities and Arts Computing. 15.1-2, 2021
This article presents a case study for the digital mapping of an ancient Greek geographical compe... more This article presents a case study for the digital mapping of an ancient Greek geographical compendium, the Sketch of Geography by Agathemerus. We examine various possibilities of investigation, including semantic annotation, georeferencing and network analysis, to verify how the digital mapping of a text can contribute to a better understanding of its underlying spatial perception. We examine the following aspects: spatial distribution, functionality and frequency of place types, semantic/symbolic definition of boundaries, place connectivity and problems of textual corruption. In the conclusion, we show that, while the general perspective of the work is programmatically speculative, Agathemerus' way of modelling the world is navigational and pragmatic. A predominantly non-cartographic perspective dictates a way of reasoning about space that is highly semantical in the definition of important landmarks and spatial relations. However, it also determines a strongly navigational approach in the treatment of geographical problems. Finally, we emphasize the value of an integrated semantic and mapped approach to the investigation of premodern geographies, and the opportunities of using these methods to address old and new research questions.
Digital Humanities Quarterly 15.3, 2021
This paper proposes text alignment in digital environments as a way to empower language learning.... more This paper proposes text alignment in digital environments as a way to empower language learning. It presents the principles and goals of text alignment in Natural Language Processing, and introduces Ugarit, a web-based translation alignment editor for the collection of aligned language pairs. Then, it reports observations on the application of translation alignment in historical language courses at Tufts and Furman University between 2017 and 2019.
http://www.digitalhumanities.org/dhq/vol/15/3/000563/000563.html
Berichte. Geographie von Landeskunde 94.2, 2021
Get in touch with me for a copy of this paper.
This paper discusses the problem of modeling des... more Get in touch with me for a copy of this paper.
This paper discusses the problem of modeling descriptive geographies of the premodern world, with a focus on Greco-Roman sources. As premodern way-finding mechanisms are essentially unmapped, the importance of language and narrative becomes fundamental to spatial understanding: the first part of this paper proposes a discussion on the linguistic-expressive patterns that provide the foundation to spatial narratives. These include, at the most basic level, expressions of distance, orientation, and semantic/conceptual classifications. The second part of the paper addresses the problem of classifying regular expressive patterns through modeling: modeling is used as a strategy to better understand such linguistic-expressive phenomena, and as a hermeneutic exercise of application of computational methods to humanities data. The conclusion touches upon the idea of visual representations of premodern spatial narratives and discusses its challenges and advantages.
Journal of Interactive Technology and Pedagogy, 2021
This paper illustrates the application of translation alignment technologies to empower a new app... more This paper illustrates the application of translation alignment technologies to empower a new approach to reading in digital environments. Digital platforms for manual translation alignment are designed to facilitate a particularly intensive and philological experience of the text, which is traditionally peculiar to the teaching and study of Classical languages. This paper presents the results of the experimental use of translation alignment in the context of Classical language teaching, and shows how the use of technology can empower a meaningful and systematic approach to information. We argue that translation alignment and similar technologies can open entirely new perspectives on reading practices, going beyond the opposed categories of “skimming” and traditional linear reading, and potentially overcoming the cognitive limitations connected with the fruition of digital content.
Linha D’Água, 2020
Resumo: As edições digitais na área de clássicas têm surgido como aliadas no rompimento de barrei... more Resumo: As edições digitais na área de clássicas têm surgido como aliadas no rompimento de barreiras e no enfrentamento de desafios no ensino das línguas e culturas clássicas. O objetivo deste artigo é destacar algumas características específicas de edições digitais das Letras Clássicas, como possíveis elementos de gênero na sua produção e sua leitura digitais, supondo serem uma forma de contribuir para a mobilização da aprendizagem das línguas clássicas e a formação docente na área. Descrevemos, de um lado, os componentes essenciais no processo de uma edição digital nova do texto grego do Édipo Rei, de Sófocles, preparado na Universidade Furman, para um curso de graduação. De outro, apontamos de que modo as anotações de edições digitais, em alinhamentos, treebanking e geoanotações, na qualidade de experiências de aprendizagem, são potenciais instrumentos de desenvolvimento.
Futuro Classico 4, 2018, pp. 149-177
Geographical and spatial descriptions in the premodern world are structurally different from the ... more Geographical and spatial descriptions in the premodern world are structurally different from the modern era, where spatial understanding is based on cartographic navigation. This paper presents an experimental process to tag, retrieve, and identify geographical information as described in premodern primary sources, together with the issues and possible solutions. The proposed method defines specific categories of geographical information and a markdown system to mark these categories in the source. Having tagged the data, we extract it and geographical locations and their connections are identified through a heuristic approach: the extracted geographical entities are initially aligned with existing geographical references and secondary sources. String similarity approaches might provide fuzzy identifications which need to be verified and disambiguated. In this paper, we describe the process of annotation and extraction of geographical descriptions, experiment some toponyms matching metrics, report the results, and offer possible solutions to handle disambiguation through the existing contextual information in the source. The process is applied to two different datasets, proposed as test cases: a classical Arabic geographical text and a Roman itinerary.
This paper proposes a methodology to address the problem of the representation of Greek and Roman... more This paper proposes a methodology to address the problem of the representation of Greek and Roman geography in the digital environment. As classical geography was not only a graphic representation of the world, but a multi-layered cultural system based on specific notions and concepts, it is now necessary to go beyond the taxonomy of place-names and their visualization on modern maps. The interpretation of ancient geography as a ‘mental model’ implies the importance of different and complementary aspects which should be addressed systematically: the expression of distances, the language of spatial orientation, the definition of environmental landmarks. For each of these aspects an integrated digital methodology is proposed, either implementing existing infrastructures or focusing on new strategies. The conclusion establishes a workflow to be tested on the corpus of the Geographi Graeci Minores, and extended to a variety of other texts.
Proceedings by Chiara Palladino
DH2023 Book of Abstracts, 2023
We present a new Named Entity Recognition tool for Ancient Greek texts which leverages on a multi... more We present a new Named Entity Recognition tool for Ancient Greek texts which leverages on a multilingual automatic translation alignment model.
LaTeCHCLfL - Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 2023
In this study, we demonstrate how to apply cross-lingual annotation projection to transfer named-... more In this study, we demonstrate how to apply cross-lingual annotation projection to transfer named-entity annotations to classical languages for which limited or no resources and annotated texts are available, aiming to enrich their NER training datasets and train a model to perform NER tagging. Our approach employs sentence-level aligned corpora of ancient texts and the translation in a modern language, for which high-quality off-the-shelf NER systems are available. We automatically annotate the text of the modern language and employ a stateof-the-art neural word alignment system to find translation equivalents. Finally, we transfer the annotations to the corresponding tokens in the ancient texts using a direct projection heuristic. We applied our method to ancient Greek and Latin using the Bible with the English translation as a parallel corpus. We used the resulting annotations to enhance the performance of an existing NER model for ancient Greek.
https://aclanthology.org/2023.latechclfl-1.19
Graph2020. Graph Technologies in the Humanities – CEUR Workshop Proceedings, 2022
This paper introduces Codex, a digital environment for modeling the practice of scholarly annotat... more This paper introduces Codex, a digital environment for modeling the practice of scholarly annotation of textual information. Codex is a text-as-graph solution that integrates a text-as-graph meta model via standoff property annotations. Codex has three distinctive characteristics. First, all standoff property annotations are connected to the user who created or edited them. Second, the affordances of standoff annotation in Codex's real-time standoff property text editor module (SPEEDy) make it convenient for users to overlay and overlap annotations, allowing for a multitude of interpretations to be captured without conflict. Third, the Statement meta model enables the user to formulate RDF-like assertions which can be applied to multiple entities. The use of trait statements constitutes a move from a 'class-based' ontology toward an 'aspect-oriented' ontology of discrete traits or qualities, which produces a classification that is both finer and more capable of functioning as a doxography that connects ontological claims to the historical sources from which they are derived. In this paper, we demonstrate a number of practical applications for Codex, focusing on how this solution contributes to the pragmatics of modeling the scholarly understanding of textual information.
Graph2020. Graph Technologies in the Humanities – CEUR Workshop Proceedings 2020. Edited by T. Andrews, F. Diehr, T. Efer, A. Kuczera, J. van Zundert. Pp. 150-171. ISSN 1613-0073. http://ceur-ws.org/Vol-3110/paper8.pdf
LREC2022 Proceedings of the 13th Conference on Language Resources and Evaluation , 2022
This paper illustrates a workflow for developing and evaluating automatic translation alignment m... more This paper illustrates a workflow for developing and evaluating automatic translation alignment models for Ancient Greek. We designed an annotation Style Guide and a gold standard for the alignment of Ancient Greek-English and Ancient Greek-Portuguese, measured inter-annotator agreement and used the resulting dataset to evaluate the performance of various translation alignment models. We proposed a fine-tuning strategy that employs unsupervised training with mono-and bilingual texts and supervised training using manually aligned sentences. The results indicate that the fine-tuned model based on XLM-Roberta is superior in performance, and it achieved good results on language pairs that were not part of the training data.
LREC 2022 Second Workshop on Language Technologies for Historical and Ancient Languages LT4HALA 2022, 2022
This paper presents the results of automatic translation alignment experiments on text corpus in ... more This paper presents the results of automatic translation alignment experiments on text corpus in Ancient Greek translated into Latin. We used a state-of-the-art alignment workflow based on a contextualized multilingual language model that is fine-tuned on the alignment task for Ancient Greek and Latin. The model is fine-tuned on monolingual Ancient Greek texts, bilingual parallel datasets, and manually aligned sentences. The performance of the alignment model is evaluated on an alignment gold standard dataset consisting of 100 parallel fragments aligned manually by two domain experts, with a 90.5% Inter-Annotator-Agreement (IAA). An interactive online interface is provided to enable users to explore the aligned fragments collection and examine the alignment model's output.
Uploads
Papers by Chiara Palladino
http://www.digitalhumanities.org/dhq/vol/15/3/000563/000563.html
This paper discusses the problem of modeling descriptive geographies of the premodern world, with a focus on Greco-Roman sources. As premodern way-finding mechanisms are essentially unmapped, the importance of language and narrative becomes fundamental to spatial understanding: the first part of this paper proposes a discussion on the linguistic-expressive patterns that provide the foundation to spatial narratives. These include, at the most basic level, expressions of distance, orientation, and semantic/conceptual classifications. The second part of the paper addresses the problem of classifying regular expressive patterns through modeling: modeling is used as a strategy to better understand such linguistic-expressive phenomena, and as a hermeneutic exercise of application of computational methods to humanities data. The conclusion touches upon the idea of visual representations of premodern spatial narratives and discusses its challenges and advantages.
Proceedings by Chiara Palladino
https://aclanthology.org/2023.latechclfl-1.19
Graph2020. Graph Technologies in the Humanities – CEUR Workshop Proceedings 2020. Edited by T. Andrews, F. Diehr, T. Efer, A. Kuczera, J. van Zundert. Pp. 150-171. ISSN 1613-0073. http://ceur-ws.org/Vol-3110/paper8.pdf
http://www.digitalhumanities.org/dhq/vol/15/3/000563/000563.html
This paper discusses the problem of modeling descriptive geographies of the premodern world, with a focus on Greco-Roman sources. As premodern way-finding mechanisms are essentially unmapped, the importance of language and narrative becomes fundamental to spatial understanding: the first part of this paper proposes a discussion on the linguistic-expressive patterns that provide the foundation to spatial narratives. These include, at the most basic level, expressions of distance, orientation, and semantic/conceptual classifications. The second part of the paper addresses the problem of classifying regular expressive patterns through modeling: modeling is used as a strategy to better understand such linguistic-expressive phenomena, and as a hermeneutic exercise of application of computational methods to humanities data. The conclusion touches upon the idea of visual representations of premodern spatial narratives and discusses its challenges and advantages.
https://aclanthology.org/2023.latechclfl-1.19
Graph2020. Graph Technologies in the Humanities – CEUR Workshop Proceedings 2020. Edited by T. Andrews, F. Diehr, T. Efer, A. Kuczera, J. van Zundert. Pp. 150-171. ISSN 1613-0073. http://ceur-ws.org/Vol-3110/paper8.pdf
Forthcoming in the proceedings.
https://digitaltolkien.com/2022/04/08/modeling-names-part-one.html
Advisor: Prof. Luciano Canfora, Università degli Studi di Bari
https://classicalstudies.org/scs-blog/chiara-palladino/review-orbis-stanford-geospatial-network-model-roman-world