Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the pro... more Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) that identifies a linear reduced data representation useful for solving the domain adaptation task. DAPCA algorithm introduces positive and negative weights between pairs of data points, and generalizes the supervised extension of principal component analysis. DAPCA is an iterat...
Drug use disorder is characterised by several terms: addiction, dependence, and abuse. We discuss... more Drug use disorder is characterised by several terms: addiction, dependence, and abuse. We discuss the notion of psychoactive substance and relations between the existing definitions. The personality traits which may be important for predisposition to use of drugs are introduced: the Five-Factor Model, impulsivity, and sensation-seeking. A number of studies have illustrated that personality traits are associated with drug consumption. The previous pertinent results are reviewed. A database with information on 1,885 respondents and their usage of 18 drugs is introduced. The results of our study are briefly outlined: the personality traits (Five-Factor Model, impulsivity, and sensation-seeking) together with simple demographic data make possible the prediction of the risk of consumption of individual drugs; personality profiles for users of different drugs. In particular, groups of heroin and ecstasy users are significantly different; there exist three correlation pleiades of drugs. These are clusters of drugs with correlated consumption, centred around heroin, ecstasy, and benzodiazepines.
Large datasets represented by multidimensional data point clouds often possess non-trivial distri... more Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of developing embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently ...
Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the pro... more Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) that identifies a linear reduced data representation useful for solving the domain adaptation task. DAPCA algorithm introduces positive and negative weights between pairs of data points, and generalizes the supervised extension of principal component analysis. DAPCA is an iterat...
Drug use disorder is characterised by several terms: addiction, dependence, and abuse. We discuss... more Drug use disorder is characterised by several terms: addiction, dependence, and abuse. We discuss the notion of psychoactive substance and relations between the existing definitions. The personality traits which may be important for predisposition to use of drugs are introduced: the Five-Factor Model, impulsivity, and sensation-seeking. A number of studies have illustrated that personality traits are associated with drug consumption. The previous pertinent results are reviewed. A database with information on 1,885 respondents and their usage of 18 drugs is introduced. The results of our study are briefly outlined: the personality traits (Five-Factor Model, impulsivity, and sensation-seeking) together with simple demographic data make possible the prediction of the risk of consumption of individual drugs; personality profiles for users of different drugs. In particular, groups of heroin and ecstasy users are significantly different; there exist three correlation pleiades of drugs. These are clusters of drugs with correlated consumption, centred around heroin, ecstasy, and benzodiazepines.
Large datasets represented by multidimensional data point clouds often possess non-trivial distri... more Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of developing embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently ...
Conference of the International Federation of Classification Societies, University of Bologna, 7th July 2015.
The problem of identification of pair of loci associated with heat tolerance in yeasts is conside... more The problem of identification of pair of loci associated with heat tolerance in yeasts is considered. Interactions of Quantitative Trait Loci (QTL) in heat selected yeast are analysed by comparing them to an unselected pool of random individuals. Data on individual F12 progeny selected for heat tolerance, which have been genotyped at 25 locations identified by sequencing a selected pool, are re-examined. 960 individuals were genotyped at these locations and multi-locus genotype frequencies were compared to 172 sequenced individuals from the original unselected pool. We use Relative Information Gain (RIG) for analysis of associations between loci. Correlation analysis in many pairs of loci requires multi testing methods. Two multi testing approaches are applied for selection of associations: False Discovery Rate (FDR) method in the version suggested by J.D. Storey and R. Tibshirani and specially developed Bootstrap Test of ordered RIG (BToRIG). BToRIG demonstrates slightly higher sensitivity than FDR approach does for FDR=1. The statistical analysis of entropy and RIG in genotypes of a selected population reveals further interactions than previously seen. Importantly this is done in comparison to the unselected population’s genotypes to account for inherent biases in the original population.
Uploads
Papers by Evgeny Mirkes