We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE-100 CE). Due to the tablets' deterioration, scholars often rely on... more
We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE-100 CE). Due to the tablets' deterioration, scholars often rely on contextual cues to manually fill in missing parts in the text in a subjective and time-consuming process. We identify that this challenge can be formulated as a masked language modelling task, used mostly as a pretraining objective for contextualized language models. Following, we develop several architectures focusing on the Akkadian language, the lingua franca of the time. We find that despite data scarcity (1M tokens) we can achieve state of the art performance on missing tokens prediction (89% hit@5) using a greedy decoding scheme and pretraining on data from other languages and different time periods. Finally, we conduct human evaluations showing the applicability of our models in assisting experts to transcribe texts in extinct languages.
We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE-100 CE). Due to the tablets' deterioration, scholars often rely on... more
We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE-100 CE). Due to the tablets' deterioration, scholars often rely on contextual cues to manually fill in missing parts in the text in a subjective and time-consuming process. We identify that this challenge can be formulated as a masked language modelling task, used mostly as a pretraining objective for contextualized language models. Following, we develop several architectures focusing on the Akkadian language, the lingua franca of the time. We find that despite data scarcity (1M tokens) we can achieve state of the art performance on missing tokens prediction (89% hit@5) using a greedy decoding scheme and pretraining on data from other languages and different time periods. Finally, we conduct human evaluations showing the applicability of our models in assisting experts to transcribe texts in extinct languages.
We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE-100 CE). Due to the tablets' deterioration, scholars often rely on contextual... more
We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE-100 CE). Due to the tablets' deterioration, scholars often rely on contextual cues to manually fill in missing parts in the text in a subjective and time-consuming process. We identify that this challenge can be formulated as a masked language modelling task, used mostly as a pretraining objective for contextualized language models. Following, we develop several architectures focusing on the Akkadian language, the lingua franca of the time. We find that despite data scarcity (1M tokens) we can achieve state of the art performance on missing tokens prediction (89% hit@5) using a greedy decoding scheme and pretraining on data from other languages and different time periods. Finally, we conduct human evaluations showing the applicability of our models in assisting experts to transcribe texts in extinct languages.
The advanced Information and Communication Technologies, combined with the development of applications based on artificial intelligence, open new possibilities to investigate Cultural Heritage in depth. The main objective of this process... more
The advanced Information and Communication Technologies, combined with the development of applications based on artificial intelligence, open new possibilities to investigate Cultural Heritage in depth. The main objective of this process is to promote the integrated knowledge of CH within its context, so that it becomes a factor of growth in the cultural, social and economic system, in specific geographical areas. Through GRID computing it is possible to directly access by web distributed databases, creating a network of different archives. In addition, a virtual reality reconstruction of areas and ontologies, supplementary capabilities designed to support intelligent fruition and multilingualism, for some time have been extensively investigated with significant results. The archaeological heritage has been the subject of investigation and study at ENEA-UTICT in a number of project activities. This paper intends to propose conceptual and methodological reflections for a fruitful int...
The advanced Information and Communication Technologies, combined with the development of applications based on artificial intelligence, open new possibilities to investigate in depth the Cultural Heritage (CH). The main objective of this... more
The advanced Information and Communication Technologies, combined with the development of applications based on artificial intelligence, open new possibilities to investigate in depth the Cultural Heritage (CH). The main objective of this process is to promote the integrated knowledge of CH within its context, so that it becomes a factor of growth in the cultural, social and economic system, in specific geographical areas. Through GRID computing it is possible the direct access by web to distributed databases, creating a network of different archives. Also a virtual reality reconstruction of areas and ontologies, additional capabilities designed to support intelligent fruition and multilingualism, for some time extensively are investigated with significant results. The archaeological heritage has been the subject of investigation and study at ENEA-UTICT in a number of project activities. This contribution intends to propose conceptual and methodological reflections for a fruitful in...