in Nuove inchieste sull'Epistola a Cangrande. Seminario di studi, Pisa, 18 dicembre 2018, Palazzo... more in Nuove inchieste sull'Epistola a Cangrande. Seminario di studi, Pisa, 18 dicembre 2018, Palazzo Matteucci
QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation) wr... more QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation) written in Python. QuaPy roots on the concept of data sample, and provides implementations of most important concepts in quantification literature, such as the most important quantification baselines, many advanced quantification methods, quantification-oriented model selection, many evaluation measures and protocols used for evaluating quantification methods. QuaPy also integrates commonly used datasets and offers visualization tools for facilitating the analysis and interpretation of results.
LeQua 2022 is a new lab for the evaluation of methods for “learning to quantify” in textual datas... more LeQua 2022 is a new lab for the evaluation of methods for “learning to quantify” in textual datasets, i.e., for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. While these predictions could be easily achieved by first classifying all documents via a text classifier and then counting the numbers of documents assigned to the classes, a growing body of literature has shown this approach to be suboptimal, and has proposed better methods. The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary setting and in the single-label multiclass setting. For each such setting we provide data either in ready-made vector form or in raw document form. 1 Learning to Quantify In a number of applications involving classification, the final goal is not determining which class (or classes) individual unlabelled items belong to, but estimating the prevalence (or “r...
These files are contain the tokenized reviews that are used for quantication experiments on text.... more These files are contain the tokenized reviews that are used for quantication experiments on text. IMDB is derived from the IMDB dataset from Maas et al., 2011 (https://ai.stanford.edu/~amaas/data/sentiment/).<br> The version of the IMDB content in this dataset has minimal processing with respect to the original dataset, yet, it is provided to unsure reproducibility of experiments. HP and Kindle dataset are Amazon reviews collected by the authors. The reviews are respectively about the books in the Harry Potter series, and about the Kindle e-book reader.
Cross-lingual Text Classification (CLC) consists of automatically classifying, according to a com... more Cross-lingual Text Classification (CLC) consists of automatically classifying, according to a common set C of classes, documents each written in one of a set of languages L, and doing so more accurately than when naively classifying each document via its corresponding language-specific classifier. In order to obtain an increase in the classification accuracy for a given language, the system thus needs to also leverage the training examples written in the other languages. We tackle multilabel CLC via funnelling, a new ensemble learning method that we propose here. Funnelling consists of generating a two-tier classification system where all documents, irrespectively of language, are classified by the same (2nd-tier) classifier. For this classifier all documents are represented in a common, language-independent feature space consisting of the posterior probabilities generated by 1st-tier, language-dependent classifiers. This allows the classification of all test documents, of any langu...
Code to reproduce the experiments reported in the paper: Corbara, S., Moreo, A., Sebastiani, F., ... more Code to reproduce the experiments reported in the paper: Corbara, S., Moreo, A., Sebastiani, F., & Tavoni, M. "The Epistle to Cangrande Through the Lens of Computational Authorship Verification." In <em>International Conference on Image Analysis and Processing</em>, pp. 148-158. Springer, Cham, 2019.
Efficiency and efficacy are two desirable properties of the utmost importance for any evaluation ... more Efficiency and efficacy are two desirable properties of the utmost importance for any evaluation metric having to do with Standard Dynamic Range (SDR) imaging or High Dynamic Range (HDR) imaging. However, these properties are hard to achieve simultaneously. On the one side, metrics like HDR-VDP2.2 are known to mimic the human visual system (HVS) very accurately, but its high computational cost prevents its widespread use in large evaluation campaigns. On the other side, computationally cheaper alternatives like PSNR or MSE fail to capture many of the crucial aspects of the HVS. In this work, we try to get the best of the two worlds: we present NoR-VDPNet++, an improved variant of a previous deep learning-based metric for distilling HDR-VDP2.2 into a convolutional neural network (CNN). In this work, we try to get the best of the two worlds: we present NoR-VDPNet++, an improved version of a deep learning-based metric for distilling HDR-VDP2.2 into a convolutional neural network (CNN).
in Nuove inchieste sull'Epistola a Cangrande. Seminario di studi, Pisa, 18 dicembre 2018, Palazzo... more in Nuove inchieste sull'Epistola a Cangrande. Seminario di studi, Pisa, 18 dicembre 2018, Palazzo Matteucci
QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation) wr... more QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation) written in Python. QuaPy roots on the concept of data sample, and provides implementations of most important concepts in quantification literature, such as the most important quantification baselines, many advanced quantification methods, quantification-oriented model selection, many evaluation measures and protocols used for evaluating quantification methods. QuaPy also integrates commonly used datasets and offers visualization tools for facilitating the analysis and interpretation of results.
LeQua 2022 is a new lab for the evaluation of methods for “learning to quantify” in textual datas... more LeQua 2022 is a new lab for the evaluation of methods for “learning to quantify” in textual datasets, i.e., for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. While these predictions could be easily achieved by first classifying all documents via a text classifier and then counting the numbers of documents assigned to the classes, a growing body of literature has shown this approach to be suboptimal, and has proposed better methods. The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary setting and in the single-label multiclass setting. For each such setting we provide data either in ready-made vector form or in raw document form. 1 Learning to Quantify In a number of applications involving classification, the final goal is not determining which class (or classes) individual unlabelled items belong to, but estimating the prevalence (or “r...
These files are contain the tokenized reviews that are used for quantication experiments on text.... more These files are contain the tokenized reviews that are used for quantication experiments on text. IMDB is derived from the IMDB dataset from Maas et al., 2011 (https://ai.stanford.edu/~amaas/data/sentiment/).<br> The version of the IMDB content in this dataset has minimal processing with respect to the original dataset, yet, it is provided to unsure reproducibility of experiments. HP and Kindle dataset are Amazon reviews collected by the authors. The reviews are respectively about the books in the Harry Potter series, and about the Kindle e-book reader.
Cross-lingual Text Classification (CLC) consists of automatically classifying, according to a com... more Cross-lingual Text Classification (CLC) consists of automatically classifying, according to a common set C of classes, documents each written in one of a set of languages L, and doing so more accurately than when naively classifying each document via its corresponding language-specific classifier. In order to obtain an increase in the classification accuracy for a given language, the system thus needs to also leverage the training examples written in the other languages. We tackle multilabel CLC via funnelling, a new ensemble learning method that we propose here. Funnelling consists of generating a two-tier classification system where all documents, irrespectively of language, are classified by the same (2nd-tier) classifier. For this classifier all documents are represented in a common, language-independent feature space consisting of the posterior probabilities generated by 1st-tier, language-dependent classifiers. This allows the classification of all test documents, of any langu...
Code to reproduce the experiments reported in the paper: Corbara, S., Moreo, A., Sebastiani, F., ... more Code to reproduce the experiments reported in the paper: Corbara, S., Moreo, A., Sebastiani, F., & Tavoni, M. "The Epistle to Cangrande Through the Lens of Computational Authorship Verification." In <em>International Conference on Image Analysis and Processing</em>, pp. 148-158. Springer, Cham, 2019.
Efficiency and efficacy are two desirable properties of the utmost importance for any evaluation ... more Efficiency and efficacy are two desirable properties of the utmost importance for any evaluation metric having to do with Standard Dynamic Range (SDR) imaging or High Dynamic Range (HDR) imaging. However, these properties are hard to achieve simultaneously. On the one side, metrics like HDR-VDP2.2 are known to mimic the human visual system (HVS) very accurately, but its high computational cost prevents its widespread use in large evaluation campaigns. On the other side, computationally cheaper alternatives like PSNR or MSE fail to capture many of the crucial aspects of the HVS. In this work, we try to get the best of the two worlds: we present NoR-VDPNet++, an improved variant of a previous deep learning-based metric for distilling HDR-VDP2.2 into a convolutional neural network (CNN). In this work, we try to get the best of the two worlds: we present NoR-VDPNet++, an improved version of a deep learning-based metric for distilling HDR-VDP2.2 into a convolutional neural network (CNN).
ACM Transactions on Knowledge Discovery from Data, 2022
Obtaining high-quality labelled data for training a classifier in a new application domain is oft... more Obtaining high-quality labelled data for training a classifier in a new application domain is often costly. Transfer Learning (a.k.a. “Inductive Transfer”) tries to alleviate these costs by transferring, to the “target” domain of interest, knowledge available from a different “source” domain. In transfer learning the lack of labelled information from the target domain is compensated by the availability at training time of a set of unlabelled examples from the target distribution. Transductive Transfer Learning denotes the transfer learning setting in which the only set of target documents that we are interested in classifying is known and available at training time. Although this definition is indeed in line with Vapnik’s original definition of “transduction”, current terminology in the field is confused. In this article, we discuss how the term “transduction” has been misused in the transfer learning literature, and propose a clarification consistent with the original characterizat...
Uploads
Talks by Alejandro Moreo
http://mediaeventi.unipi.it/category/video/Nuove-inchieste-sull039epistola-a-Cangrande-parte-prima/96037e24b19af8f14623658b783e07fb/184
Papers by Alejandro Moreo
http://mediaeventi.unipi.it/category/video/Nuove-inchieste-sull039epistola-a-Cangrande-parte-prima/96037e24b19af8f14623658b783e07fb/184