Simone Marchi

Consiglio Nazionale delle Ricerche (CNR), Istituto di Linguistica Computazionale "A. Zampolli", Faculty Member

University of Pisa, Computer Science, Alumnus

Followers

Following

Co-authors

Public Views

Interests

Uploads

Papers

Improved Written Arabic Word Parsing through Orthographic, Syntactic and Semantic constraints

Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015

Download

Thinking like the " Modern Operating Systems " : The Omega architecture and the Clavius on the Web project

The current digital turn in studying and analyzing historical documents results in both having ma... more The current digital turn in studying and analyzing historical documents results in both having machine actionable cultural data and providing software able to process them. However, these data and services often lack in integration strategies among them in order to be reused in other contexts different from the original ones. As pointed out by Franz Fischer in a worthy of note article: “There is no out-of-the-box software available for creating truly critical and truly digital editions at the same time” [1]. Likewise, Monica Berti stated that is now important to "build a model for representing quotations and text reuses of lost works in a digital environment” [2]. In this vision Bridget Almas is in charge of developing an integrated platform for collaboratively transcribing, editing, and translating historical documents and texts. She claimed that through this platform, called Perseids, students and scholars are able to create open source digital scholarly editions [3]. A numbe...

Download

Dal testo alla conoscenza e ritorno : estrazione terminologica e annotazione semantica di basi documentali di dominio

The paper focuses on the automatic extraction of domain knowledge from Italian legal texts and pr... more The paper focuses on the automatic extraction of domain knowledge from Italian legal texts and presents a fully-implemented ontology learning system (T2K, Text-2-Knowledge) that includes a battery of tools for Natural Language Processing, statistical text analysis and machine learning. Evaluated results show the considerable potential of systems like T2K, exploiting an incremental interleaving of NLP and machine learning techniques for accurate large-scale semi-automatic extraction and structuring of domain-specific knowledge.

Download

Defining the Core Entities of an Environment for Textual Processing in Literary Computing

One of the main challenges of the DH community is to provide suitable software models and tools. ... more One of the main challenges of the DH community is to provide suitable software models and tools. To model the literary domain and the relative user requirements, we chose to follow the engineering principles of object-oriented analysis and design. The digital representation of a textual resource is a challenge as it involves several theoretical and epistemological issues in semiotics, paleography, philology, linguistics, engineering, and computer science. We have designed and implemented a set of core entities as the fundamental data types shared among all the components of the environment.

The BioLexicon: a large-scale terminological resource for biomedical text mining

BMC Bioinformatics, 2011

Download

E.: Domain Adaptation for Dependency Parsing at Evalita 2011

Abstract. The domain adaptation task was aimed at investigating techniques for adapting state-of-... more Abstract. The domain adaptation task was aimed at investigating techniques for adapting state-of-the-art dependency parsing systems to new domains. Both the language dealt with, i.e. Italian, and the target domain, namely the legal domain, represent two main novelties of the task organised at Evalita 2011. In this paper, we define the task and describe how the datasets were created from different resources. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results.

Download

NLP-enhanced Content Filtering Within the POESIA Project

This paper introduces the POESIA internet filtering system, which is open-source, and which combi... more This paper introduces the POESIA internet filtering system, which is open-source, and which combines standard filtering methods, such as positive/negative URL lists, with more advanced techniques, such as image processing and NLP-enhanced text filtering. The description here focusses on components providing textual content filtering for three European languages (English, Italian and Spanish), employing NLP methods to enhance performance. We address also the acquisition of language data needed to develop these filters, and the evaluation of the system and its components.

Download

Investigating the Application of Distributional Semantics to Stylometry

English. The inclusion of semantic features in the stylometric analysis of literary texts appears... more English. The inclusion of semantic features in the stylometric analysis of literary texts appears to be poorly investigated. In this work, we experiment with the application of Distributional Semantics to a corpus of Italian literature to test if words distribution can convey stylistic cues. To verify our hypothesis, we have set up an Authorship Attribution experiment. Indeed, the results we have obtained suggest that the style of an author can reveal itself through words distribution too. Italiano. L’inclusione di caratteristiche semantiche nell’analisi stilometrica di testi letterari appare poco studiata. In questo lavoro, sperimentiamo l’applicazione della Semantica Distribuzionale ad un corpus di letteratura italiana per verificare se la distribuzione delle parole possa fornire indizi stilistici. Per verificare la nostra ipotesi, abbiamo imbastito un esperimento di Authorship Attribution. I risultati ottenuti suggeriscono che, effettivamente, lo stile di un autore pu rivelarsi a...

Download

Language Modulation by Hypnotizability

Bellini’s Correspondence: a Digital Scholarly Edition for a Multimedia Museum

Within the “Museo Virtuale della Musica BellinInRete” project, a corpus of letters, written by th... more Within the “Museo Virtuale della Musica BellinInRete” project, a corpus of letters, written by the renowned composer Vincenzo Bellini (1801 - 1835) from Catania, will be encoded and made publicly available. This contribution aims at illustrating the part of the project regarding the implementation of the prototype for the metadata and text encoding, indexing and visualisation of Bellini’s correspondence. The encoding scheme has been defined according to the latest guidelines of the Text Encoding Initiative and it has been instantiated on a sample of letters. Contextually, a first environment has been implemented by customizing two open source tools: Edition Visualization Technology and Omega Scholarly platform . The main objective of the digital edition is to engage general public with the cultural heritage held by the Belliniano Civic Museum of Catania. This wide access to Bellini’s correspondence has been conceived preserving the scholarly transcriptions of the letters edited by S...

Download

Suscettibilità ipnotica e linguaggio

RESEARCH ARTICLE The BioLexicon: a large-scale terminological resource for biomedical text mining Open Access

Background: Due to the rapidly expanding body of biomedical literature, biologists require increa... more Background: Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results: This article provides an overvi...

Download

Towards a Decision Support System for Text Interpretation

This article illustrates the first steps towards the implementation of a Decision Support System ... more This article illustrates the first steps towards the implementation of a Decision Support System aimed to recreate a research environment for scholars and provide them with computational tools to assist in the processing and interpretation of texts. While outlining the general characteristics of the system, the paper presents a minimal set of user requirements and provides a possible use case on Dante’s Inferno.

MultiMedia metadata management: a proposal for an infrastructure

by Simone Marchi and Emiliano Giovannetti

Workshop on Semantic Web …

Abstract The management and exchange of multimedia data is a challenging area of research due to... more

Ontology-based Semantic Annotation of Product Catalogues

Creation and use of lexicons and ontologies for NL interfaces to databases

Download

Multimedia Information Extraction in Ontology-based Semantic Annotation of Product Catalogues

Proc. of the 3rd Italian Semantic Web Workshop-SWAP 2006, 2006

Abstract—The demand for efficient methods for extracting knowledge from multimedia content has le... more Abstract—The demand for efficient methods for extracting knowledge from multimedia content has led to a growing research community investigating the convergence of multimedia and knowledge technologies. In this paper we describe a methodology for extracting multimedia information from product catalogues empowered by the synergetic use and extension of a domain ontology. The methodology was implemented in the Trade Fair Advanced Semantic Annotation Pipeline of the VIKE-framework. Index Terms—Semantic ...

Download

NLP-based metadata extraction for legal text consolidation

Proceedings of the 12th International Conference on Artificial Intelligence and Law - ICAIL '09, 2009

Download

2020. "The terminology of the Babylonian Talmud: Extraction, Representation and Use in the Context of Computational Linguistics", Materia Giudaica. Rivista dell'associazione italiana per lo studio del giudaismo, (xxv) 2020, Giuntina, Firenze, pp. 61-74.

by Alessandra Pecchioli, Emiliano Giovannetti, and Simone Marchi

Materia Giudaica, 2020

A formal digital structuring of the terminology of the Talmud is being carried out in the context... more A formal digital structuring of the terminology of the Talmud is being carried out in the context of the Project for the Translation of the Babylonian Talmud into Italian. According to the principles of the Meaning-Text Theory, the terminological resource was encoded in the form of a multi-language Explanatory Combinatorial Dictionary (Hebrew-Aramaic-Italian). The construction of such a resource was supported by text processing and computational linguistics techniques aimed at automatically extracting terms from the Italian translation of the Talmud and aligning them with the corresponding Hebrew/Aramaic
source terms. The paper describes the process that was set up for constructing the terminological resource with the ultimate goal of illustrating the advantages of adopting a formal linguistic
model. The terminological resource aims to be a useful tool to deepen the characteristics of the languages of the Talmud, help translators in their work, and more generally, scholars in their study of the Talmud itself.

Download

2018b. "Annotazione Linguistica Automatica dell'ebraico Mishnaico: esperimenti sul Talmud babilonese", Materia Giudaica. Rivista dell'associazione italiana per lo studio del giudaismo, xxiii (2018), Giuntina, Firenze, pp. 281-291

by Alessandra Pecchioli, Emiliano Giovannetti, and Simone Marchi

Materia Giudaica, 2018

The automatic linguistic analysis of ancient Hebrew represents a new research opportunity in the ... more The automatic linguistic analysis of ancient Hebrew represents a new research opportunity in the field of Jewish studies. In fact, very little has been produced, both in terms of linguistic resources and, above all, of tools for analysis of ancient Hebrew. This article illustrates a work born within the Italian Translation of the Babylonian Talmud Project aimed at the construction of an automatic linguistic annotator of mishnaic Hebrew.

Download

Improved Written Arabic Word Parsing through Orthographic, Syntactic and Semantic constraints

Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015

Download

Thinking like the " Modern Operating Systems " : The Omega architecture and the Clavius on the Web project

Download

Dal testo alla conoscenza e ritorno : estrazione terminologica e annotazione semantica di basi documentali di dominio

Download

Defining the Core Entities of an Environment for Textual Processing in Literary Computing

The BioLexicon: a large-scale terminological resource for biomedical text mining

BMC Bioinformatics, 2011

Download

E.: Domain Adaptation for Dependency Parsing at Evalita 2011

Download

NLP-enhanced Content Filtering Within the POESIA Project

Download

Investigating the Application of Distributional Semantics to Stylometry

Download

Language Modulation by Hypnotizability

Bellini’s Correspondence: a Digital Scholarly Edition for a Multimedia Museum

Download

Suscettibilità ipnotica e linguaggio

RESEARCH ARTICLE The BioLexicon: a large-scale terminological resource for biomedical text mining Open Access

Download

Towards a Decision Support System for Text Interpretation

MultiMedia metadata management: a proposal for an infrastructure

by Simone Marchi and Emiliano Giovannetti

Workshop on Semantic Web …

Abstract The management and exchange of multimedia data is a challenging area of research due to... more

Ontology-based Semantic Annotation of Product Catalogues

Creation and use of lexicons and ontologies for NL interfaces to databases

Download

Multimedia Information Extraction in Ontology-based Semantic Annotation of Product Catalogues

Proc. of the 3rd Italian Semantic Web Workshop-SWAP 2006, 2006

Download

NLP-based metadata extraction for legal text consolidation

Proceedings of the 12th International Conference on Artificial Intelligence and Law - ICAIL '09, 2009

Download

by Alessandra Pecchioli, Emiliano Giovannetti, and Simone Marchi

Materia Giudaica, 2020

Download

by Alessandra Pecchioli, Emiliano Giovannetti, and Simone Marchi

Materia Giudaica, 2018

Download

The BioLexicon: a large-scale terminological resource for biomedical text mining

by Quochi Valeria, Giulia Venturi, Riccardo Del Gratta Gratta, and Simone Marchi

BMC …, Jan 1, 2011

Download

La Terminologia del Talmud Babilonese: Estrazione, Rappresentazione e Uso nel Contesto della Linguistica Computazionale - 2019, AISG "Ebraismo fra peculiarità e interculturalità", Ravenna 2-4 settembre

by Alessandra Pecchioli, Emiliano Giovannetti, and Simone Marchi

Nel contesto del Progetto per la traduzione del Talmud babilonese in italiano (PTTB) si sta proce... more Nel contesto del Progetto per la traduzione del Talmud babilonese in italiano (PTTB) si sta procedendo a una strutturazione digitale formale della terminologia. La risorsa terminologica è stata codificata sotto forma di un dizionario combinatorio ed esplicativo, multilingue (ebraico-aramaico-italiano) secondo i principi della teoria testo-senso. La costruzione di tale risorsa è stata supportata dall'elaborazione del testo e dalle tecniche linguistiche computazionali volte a estrarre automaticamente i termini dalla traduzione italiana del Talmud e ad allinearli con i corrispondenti termini in ebraico / aramaico. L'articolo descrive il processo avviato per la costruzione della risorsa terminologica con l'obiettivo finale di illustrare i vantaggi dell'adozione di un modello linguistico formale. La risorsa terminologica mira, infatti, a essere uno strumento utile per approfondire le caratteristiche delle lingue del Talmud, per aiutare i traduttori nel loro lavoro e più in generale l'ampia platea di studiosi del Talmud.

Download

Defining the Core Entities of an Environment for Textual Processing in Literary Computing

by Angelo M Del Grosso, Emiliano Giovannetti, Simone Marchi, and D. Albanesi

Download

Thinking like the "Modern Operating Systems": The Omega architecture and the Clavius on the Web project

by Angelo M Del Grosso and Simone Marchi

The Literary Computing group of the Institute for Computational Linguistics at the National Resea... more The Literary Computing group of the Institute for Computational Linguistics at the National Research Council of Italy (ILC-CNR) is carrying on a line of research about designing software models for textual scholarship as well as implementing them by using cutting-edge software engineering approaches and technologies. The research work is aimed at providing a general framework (called Omega) [4] - inherently conceived with the object-oriented paradigm and semantic web technologies - suitable for studying historical and literary documents and texts.

Download

Simone Marchi

Uploads

Log In