-
Active Informed Consent to Boost the Application of Machine Learning in Medicine
Authors:
Marco Gerardi,
Katarzyna Barud,
Marie-Catherine Wagner,
Nikolaus Forgo,
Francesca Fallucchi,
Noemi Scarpato,
Fiorella Guadagni,
Fabio Massimo Zanzotto
Abstract:
Machine Learning may push research in precision medicine to unprecedented heights. To succeed, machine learning needs a large amount of data, often including personal data. Therefore, machine learning applied to precision medicine is on a cliff edge: if it does not learn to fly, it will deeply fall down. In this paper, we present Active Informed Consent (AIC) as a novel hybrid legal-technological…
▽ More
Machine Learning may push research in precision medicine to unprecedented heights. To succeed, machine learning needs a large amount of data, often including personal data. Therefore, machine learning applied to precision medicine is on a cliff edge: if it does not learn to fly, it will deeply fall down. In this paper, we present Active Informed Consent (AIC) as a novel hybrid legal-technological tool to foster the gathering of a large amount of data for machine learning. We carefully analyzed the compliance of this technological tool to the legal intricacies protecting the privacy of European Citizens.
△ Less
Submitted 27 September, 2022;
originally announced October 2022.
-
The Dark Side of the Language: Pre-trained Transformers in the DarkNet
Authors:
Leonardo Ranaldi,
Aria Nourbakhsh,
Arianna Patrizi,
Elena Sofia Ruzzetti,
Dario Onorati,
Francesca Fallucchi,
Fabio Massimo Zanzotto
Abstract:
Pre-trained Transformers are challenging human performances in many NLP tasks. The massive datasets used for pre-training seem to be the key to their success on existing tasks. In this paper, we explore how a range of pre-trained Natural Language Understanding models perform on definitely unseen sentences provided by classification tasks over a DarkNet corpus. Surprisingly, results show that synta…
▽ More
Pre-trained Transformers are challenging human performances in many NLP tasks. The massive datasets used for pre-training seem to be the key to their success on existing tasks. In this paper, we explore how a range of pre-trained Natural Language Understanding models perform on definitely unseen sentences provided by classification tasks over a DarkNet corpus. Surprisingly, results show that syntactic and lexical neural networks perform on par with pre-trained Transformers even after fine-tuning. Only after what we call extreme domain adaptation, that is, retraining with the masked language model task on all the novel corpus, pre-trained Transformers reach their standard high results. This suggests that huge pre-training corpora may give Transformers unexpected help since they are exposed to many of the possible sentences.
△ Less
Submitted 17 November, 2023; v1 submitted 14 January, 2022;
originally announced January 2022.
-
Lacking the embedding of a word? Look it up into a traditional dictionary
Authors:
Elena Sofia Ruzzetti,
Leonardo Ranaldi,
Michele Mastromattei,
Francesca Fallucchi,
Fabio Massimo Zanzotto
Abstract:
Word embeddings are powerful dictionaries, which may easily capture language variations. However, these dictionaries fail to give sense to rare words, which are surprisingly often covered by traditional dictionaries. In this paper, we propose to use definitions retrieved in traditional dictionaries to produce word embeddings for rare words. For this purpose, we introduce two methods: Definition Ne…
▽ More
Word embeddings are powerful dictionaries, which may easily capture language variations. However, these dictionaries fail to give sense to rare words, which are surprisingly often covered by traditional dictionaries. In this paper, we propose to use definitions retrieved in traditional dictionaries to produce word embeddings for rare words. For this purpose, we introduce two methods: Definition Neural Network (DefiNNet) and Define BERT (DefBERT). In our experiments, DefiNNet and DefBERT significantly outperform state-of-the-art as well as baseline methods devised for producing embeddings of unknown words. In fact, DefiNNet significantly outperforms FastText, which implements a method for the same task-based on n-grams, and DefBERT significantly outperforms the BERT method for OOV words. Then, definitions in traditional dictionaries are useful to build word embeddings for rare words.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.