Search | arXiv e-print repository

Active Informed Consent to Boost the Application of Machine Learning in Medicine

Authors: Marco Gerardi, Katarzyna Barud, Marie-Catherine Wagner, Nikolaus Forgo, Francesca Fallucchi, Noemi Scarpato, Fiorella Guadagni, Fabio Massimo Zanzotto

Abstract: Machine Learning may push research in precision medicine to unprecedented heights. To succeed, machine learning needs a large amount of data, often including personal data. Therefore, machine learning applied to precision medicine is on a cliff edge: if it does not learn to fly, it will deeply fall down. In this paper, we present Active Informed Consent (AIC) as a novel hybrid legal-technological… ▽ More Machine Learning may push research in precision medicine to unprecedented heights. To succeed, machine learning needs a large amount of data, often including personal data. Therefore, machine learning applied to precision medicine is on a cliff edge: if it does not learn to fly, it will deeply fall down. In this paper, we present Active Informed Consent (AIC) as a novel hybrid legal-technological tool to foster the gathering of a large amount of data for machine learning. We carefully analyzed the compliance of this technological tool to the legal intricacies protecting the privacy of European Citizens. △ Less

Submitted 27 September, 2022; originally announced October 2022.

ACM Class: J.3

arXiv:2201.05613 [pdf, other]

doi 10.26615/978-954-452-092-2_102

The Dark Side of the Language: Pre-trained Transformers in the DarkNet

Authors: Leonardo Ranaldi, Aria Nourbakhsh, Arianna Patrizi, Elena Sofia Ruzzetti, Dario Onorati, Francesca Fallucchi, Fabio Massimo Zanzotto

Abstract: Pre-trained Transformers are challenging human performances in many NLP tasks. The massive datasets used for pre-training seem to be the key to their success on existing tasks. In this paper, we explore how a range of pre-trained Natural Language Understanding models perform on definitely unseen sentences provided by classification tasks over a DarkNet corpus. Surprisingly, results show that synta… ▽ More Pre-trained Transformers are challenging human performances in many NLP tasks. The massive datasets used for pre-training seem to be the key to their success on existing tasks. In this paper, we explore how a range of pre-trained Natural Language Understanding models perform on definitely unseen sentences provided by classification tasks over a DarkNet corpus. Surprisingly, results show that syntactic and lexical neural networks perform on par with pre-trained Transformers even after fine-tuning. Only after what we call extreme domain adaptation, that is, retraining with the masked language model task on all the novel corpus, pre-trained Transformers reach their standard high results. This suggests that huge pre-training corpora may give Transformers unexpected help since they are exposed to many of the possible sentences. △ Less

Submitted 17 November, 2023; v1 submitted 14 January, 2022; originally announced January 2022.

Report number: 2023.ranlp-1.102

Journal ref: 2023.ranlp-1.102

arXiv:2109.11763 [pdf, other]

doi 10.18653/v1/2022.findings-acl.208

Lacking the embedding of a word? Look it up into a traditional dictionary

Authors: Elena Sofia Ruzzetti, Leonardo Ranaldi, Michele Mastromattei, Francesca Fallucchi, Fabio Massimo Zanzotto

Abstract: Word embeddings are powerful dictionaries, which may easily capture language variations. However, these dictionaries fail to give sense to rare words, which are surprisingly often covered by traditional dictionaries. In this paper, we propose to use definitions retrieved in traditional dictionaries to produce word embeddings for rare words. For this purpose, we introduce two methods: Definition Ne… ▽ More Word embeddings are powerful dictionaries, which may easily capture language variations. However, these dictionaries fail to give sense to rare words, which are surprisingly often covered by traditional dictionaries. In this paper, we propose to use definitions retrieved in traditional dictionaries to produce word embeddings for rare words. For this purpose, we introduce two methods: Definition Neural Network (DefiNNet) and Define BERT (DefBERT). In our experiments, DefiNNet and DefBERT significantly outperform state-of-the-art as well as baseline methods devised for producing embeddings of unknown words. In fact, DefiNNet significantly outperforms FastText, which implements a method for the same task-based on n-grams, and DefBERT significantly outperforms the BERT method for OOV words. Then, definitions in traditional dictionaries are useful to build word embeddings for rare words. △ Less

Submitted 24 September, 2021; originally announced September 2021.

Journal ref: Findings of the Association for Computational Linguistics: ACL 2022

Showing 1–3 of 3 results for author: Fallucchi, F