Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Machine learning for data mining applications in the field of bioinformatics is to extract new knowledge to provide an improved and effective diagnosis process for patients. In this paper, we introduce an adaptive ensemble learning for... more
Machine learning for data mining applications in the field of bioinformatics is to extract new knowledge to provide an improved and effective diagnosis process for patients. In this paper, we introduce an adaptive ensemble learning for classifying high-dimensional multi-class imbalanced genomic data. The aspect is to design and develop an optimal ensemble method for information discovery on genomic data, which improve the prediction accuracy of DNA variant classification. The proposed method is based on ensemble of decision trees, data pre-processing, feature selection and grouping. It converts an imbalanced genomic data into multiple balanced ones and then builds a number of decision trees on these multiple data with specific feature groups. The outputs of these trees are combined for classifying new instances by majority voting technique. In this empirical study, different ensemble predictive modelling techniques like Random Forest, Boosting and Bagging were compared with the proposed ensemble method. The experimental results on genomic data (148 Exome datasets) of Brugada syndrome from the Centre of Medical Genetics, VUB UZ Brussel show that the proposed method is usually superior to the conventional ensemble learning algorithms when classifying the high-dimensional multi-class imbalanced genomic data.
High-dimensional genomic big data with hundred of features present a big challenge in cluster analysis. Usually, genomic data are noisy and have correlation among the features. Also, different subspaces exist in high-dimensional genomic... more
High-dimensional genomic big data with hundred of features present a big challenge in cluster analysis. Usually, genomic data are noisy and have correlation among the features. Also, different subspaces exist in high-dimensional genomic data. This paper presents a feature selecting and grouping method for ensemble clustering of high-dimensional genomic data. Two most popular clustering methods: k-means and similarity-based clustering are used for ensemble clustering. Ensemble clustering is more effective in clustering high-dimensional complex data than the traditional clustering algorithms. In this paper, we cluster un-labeled genomic data (148 Exome data sets) of Brugada syndrome from the Centre of Medical Genetics, VUB UZ Brussel using SimpleKMeans, XMeans, DBScan, and MakeDensityBasedCluster algorithms and compare the clustering results with proposed ensemble clustering method. Furthermore, we use biclustering (δ-Biclustering) algorithm on each cluster to find the sub-matrices in the genomic data, which clusters both instances and features simultaneously.
We compare three adaptive MCMC samplers to MetropolisHastings algorithm with optimal proposal distribution as our benchmark. We transform a simple Evolution Strategy algorithm into a sampler and show that it already outperforms the other... more
We compare three adaptive MCMC samplers to MetropolisHastings algorithm with optimal proposal distribution as our benchmark. We transform a simple Evolution Strategy algorithm into a sampler and show that it already outperforms the other samplers on the test suite used in the initial research on adaptive MCMC.
info:eu-repo/semantics/publishe
An empirical comparative study is made of a sample of action selection policies on a test suite of the Bernoulli multi-armed bandit with K = 10, K = 20 and K = 50 arms, each for which we consider several success probabilities. For such... more
An empirical comparative study is made of a sample of action selection policies on a test suite of the Bernoulli multi-armed bandit with K = 10, K = 20 and K = 50 arms, each for which we consider several success probabilities. For such problems the rewards are either Success or Failure with unknown success rate. Our study focusses on - greedy, UCB1-Tuned, Thompson sampling, the Gittin's index policy, the knowledge gradient and a new hybrid algorithm. The last two are not well- known in computer science. In this paper, we examine policy dependence on the horizon and report results which suggest that a new hybridized procedure based on Thompsons sampling improves on its regret.
ABSTRACT We define shortest path Gaussian kernels basis functions over state graphs and state-action graphs. We empirically demonstrate that these new basis functions used in linear parametric function approximation outperform basis... more
ABSTRACT We define shortest path Gaussian kernels basis functions over state graphs and state-action graphs. We empirically demonstrate that these new basis functions used in linear parametric function approximation outperform basis functions defined on the state space, the state graph and the state-action graph.
Student retention has become one of the most important priorities for decision makers in higher learning institutions (HLI). Improving student retention starts with a thorough understanding of the reasons behind the attrition. In this... more
Student retention has become one of the most important priorities for decision makers in higher learning institutions (HLI). Improving student retention starts with a thorough understanding of the reasons behind the attrition. In this study, using student demographic and institutional data along with several business intelligence (BI) techniques, we developed prototype to predict likelihood of student persistence or dropout. This study used classification models generated using Waikato Environment for Knowledge Analysis (WEKA). The model was built using the 10-fold cross validation, and holdout method (60% of the data was used as training and the remaining as test and validation). Random sampling techniques were used in selecting the datasets. The attribute selection analysis of the models revealed that the student age on entry, parent occupation, health of student and financial variables are among the most important predictors of the phenomenon. Results of the classifiers were comp...
info:eu-repo/semantics/publishe
info:eu-repo/semantics/publishe
Research Interests:
Research Interests:
Research Interests:
We compare empirically the knowledge gradient exploration policy with the e-greedy one in online leastsquares policy iteration on a testbed of 2 infinite horizon Markov decision problems. It is shown that the knowledge gradient, although... more
We compare empirically the knowledge gradient exploration policy with the e-greedy one in online leastsquares policy iteration on a testbed of 2 infinite horizon Markov decision problems. It is shown that the knowledge gradient, although it does not have parameters to be tuned, performs as well as a well-tuned egreedy exploration policy.
In this paper, we propose the use of schemata bandits for optimisation. This technique is a subclass of hierarchical bandits where the bandits are schemata. We investigate its use on a benchmark of binary combinatorial optimization... more
In this paper, we propose the use of schemata bandits for optimisation. This technique is a subclass of hierarchical bandits where the bandits are schemata. We investigate its use on a benchmark of binary combinatorial optimization problems, the Maximum Satisfiability (MAXSAT) problem. We compare performance with hierarchical Bayesian Optimization Algorithms (hBOAs), namely GSAT and WALKSAT. Results suggest that using a bandit strategy enhances solver performance.
A stochastic multi-objective multi-armed bandit problem is a particular type of multi-objective (MO) optimization problems where the goal is to find and play fairly the optimal arms. To solve the multi-objective optimization problem, we... more
A stochastic multi-objective multi-armed bandit problem is a particular type of multi-objective (MO) optimization problems where the goal is to find and play fairly the optimal arms. To solve the multi-objective optimization problem, we propose annealing linear scalarized algorithm that transforms the MO optimization problem into a single one by using a linear scalarization function, and finds and plays fairly the optimal arms by using a decaying parameter εt. We compare empirically linear scalarized-UCB1 algorithm with the annealing linear scalarized algorithm on a test suit of multi-objective multi-armed bandit problems with independent Bernoulli distributions using different approaches to define weight sets. We used the standard approach, the adaptive approach and the genetic approach. We conclude that the performance of the annealing scalarized and the scalarized UCB1 algorithms depend on the used weight approach.
In order to use evolutionary algorithms (EAs) including genetic algorithms (GAs) in real time or for hard real-world applications, their current speed has to be increased several orders of magnitude. This section reviews research... more
In order to use evolutionary algorithms (EAs) including genetic algorithms (GAs) in real time or for hard real-world applications, their current speed has to be increased several orders of magnitude. This section reviews research activities related to hardware realizations of EAs. First, we consider parallel implementations of GAs on different parallel machines. Then, we focus on more dedicated hardware systems for EAs. For example, a TSP GA machine, a wafer-scale GA machine, and vector processing of GA operators are described. Here, we ...
Research Interests:
We study how a group of adaptive agents can coordinate when competing for limited resources. A popular game theoretic model for this is the Minority Game. In this article we show that the coordination among learning agents can improve... more
We study how a group of adaptive agents can coordinate when competing for limited resources. A popular game theoretic model for this is the Minority Game. In this article we show that the coordination among learning agents can improve when agents use different learning parameters or even evolve their learning parameters. Better coordination leads to less resources being wasted and agents achieving higher individual performance. We also show that learning algorithms which achieve good results when all agents use that same ...
Abstract We study learning in the time-dependent Minority Game (MG). The MG is a repeated conflicting interest game involving a large number of agents. So far, the learning mechanisms studied were rather naive and involved only... more
Abstract We study learning in the time-dependent Minority Game (MG). The MG is a repeated conflicting interest game involving a large number of agents. So far, the learning mechanisms studied were rather naive and involved only exploitation of the best strategy so far at the expense of exploring new strategies. Instead, we use a reinforcement learning method called Q-learning and show how it improves the results on MG extensions of increasing difficulty.
Research Interests:
A large body of public domain software exists which addresses standard implementations of the Genetic Programming paradigm. Nevertheless researchers are frequently confronted with the lack of flexibility and reusability of the tools when... more
A large body of public domain software exists which addresses standard implementations of the Genetic Programming paradigm. Nevertheless researchers are frequently confronted with the lack of flexibility and reusability of the tools when for instance one wants to alter the genotypes representation or the overall behavior of the evolutionary process. This paper addresses the construction of a object-oriented Genetic Programming framework using on design patterns to increase its flexibility and reusability.
Biological development is a stunning mechanism that allows robust generation of complex structures from a linear building plan. This makes it an interesting source of inspiration for solving problems where direct manipulation of a... more
Biological development is a stunning mechanism that allows robust generation of complex structures from a linear building plan. This makes it an interesting source of inspiration for solving problems where direct manipulation of a higher-order structure is hard, and the generative building plan can be used as a substitute for indirect manipulation of the unfolded structure. In this paper we propose CA-DEV as a simple computational model for development of rules for non-uniform 2D cellular automata. While being a simplified ...
ABSTRACT In order to simplify optimization in many-objective search spaces, we propose the Cartesian product of scalarization functions to reduce the number of objectives of the search space. To achieve this, we design a stochastic Pareto... more
ABSTRACT In order to simplify optimization in many-objective search spaces, we propose the Cartesian product of scalarization functions to reduce the number of objectives of the search space. To achieve this, we design a stochastic Pareto local search algorithm and we demonstrate their use on examples of product functions. We test this algorithm on generated many-objective quadratic assignment instances with correlated flow matrices. The experimental tests show a superior performance for the local search algorithms using product functions instead of the standard scalarization functions. For instances with strong correlation between the flow matrices, product based algorithms have similar performance with the standard Pareto local search.
ABSTRACT In this work, we develop a new paradigm, called Meta-Evolutionary Algorithms, motivated by the challenging, continuous problems encountered in the domain of satisfiability in fuzzy logics (SAT∞). In Meta-Evolutionary Algorithms,... more
ABSTRACT In this work, we develop a new paradigm, called Meta-Evolutionary Algorithms, motivated by the challenging, continuous problems encountered in the domain of satisfiability in fuzzy logics (SAT∞). In Meta-Evolutionary Algorithms, the individuals in a population are optimization algorithms them-selves. Mutation at the meta-population level is handled by performing an optimization step in each optimization algorithm, and recombination at the meta-population level is handled by exchanging information between different algorithms. We analyse different recombination operators and empirically show that simple Meta-Evolutionary Algorithms are able to outperform CMA-ES on a set of SAT∞ benchmark problems.
ABSTRACT We study honest signaling in the Philip Sidney game. Until now, researchers concentrated on verifying under what circumstances honest signaling is an evolutionarily stable strategy (ESS). Whereas the concept of ESS assumes... more
ABSTRACT We study honest signaling in the Philip Sidney game. Until now, researchers concentrated on verifying under what circumstances honest signaling is an evolutionarily stable strategy (ESS). Whereas the concept of ESS assumes infinite populations, we analyze here, for the first time, the more realistic scenario where populations are finite—which allows us to study the effect of varying the population size with respect to the viability of honest signaling. We show that honest signaling is much less frequent than previously observed within the infinite population setting. We observe that population size has a similar effect as selection pressure, namely, the larger the population the more important the difference in fitness between the strategies. Our experiments reveal, furthermore, that evolutionary stability is not very predictive for the viability of honest signaling. Most surprisingly, we found cases where honest signaling is the most prevalent strategy but not evolutionarily stable.
ABSTRACT Traffic jams and suboptimal traffic flows are ubiquitous in our modern societies, and they create enormous economic losses each year. Delays at traffic lights alone contribute roughly 10 percent of all delays in US traffic. As... more
ABSTRACT Traffic jams and suboptimal traffic flows are ubiquitous in our modern societies, and they create enormous economic losses each year. Delays at traffic lights alone contribute roughly 10 percent of all delays in US traffic. As most traffic light scheduling systems currently in use are static, set up by human experts rather than being adaptive, the interest in machine learning approaches to this problem has increased in recent years. Reinforcement learning approaches are often used in these studies, as they require little pre-existing knowledge about traffic flows. Some distributed constraint optimization approaches have also been used, but focus on cases where the traffic flows are known. This paper presents a preliminary comparison between these two classes of optimization methods in a complex simulator, with the goal of eventually producing real-time algorithms that could be deployed in real-world situations.
In this paper the scheduling of n independent jobs on m non-identical machines is considered for a large concrete schedule space for 30 jobs and 6 machines. The schedule space is about 1023 which is large enough to render exhaustive... more
In this paper the scheduling of n independent jobs on m non-identical machines is considered for a large concrete schedule space for 30 jobs and 6 machines. The schedule space is about 1023 which is large enough to render exhaustive systematic search for the optimal schedule limited. The schedules are generated by agents that represent the jobs as they randomly select the machines on which the jobs should be processed. The schedules that are generated are evaluated using the makespan which is the total time taken for all ...
Research Interests:
Abstract. We compare well-known action selection policies used in reinforcement learning like e-greedy and softmax with lesser known ones like the Gittins index and the knowledge gradient on bandit problems. The latter two are in... more
Abstract. We compare well-known action selection policies used in reinforcement learning like e-greedy and softmax with lesser known ones like the Gittins index and the knowledge gradient on bandit problems. The latter two are in comparison very performant. Moreover the knowledge gradient can be generalized to other than bandit problems.
In this paper we propose a structure learning algorithm for Multi-Agent Causal Models, which are an extension of Causal Bayesian Networks to a distributed domain. It is assumed that there is no single agent that has all the information of... more
In this paper we propose a structure learning algorithm for Multi-Agent Causal Models, which are an extension of Causal Bayesian Networks to a distributed domain. It is assumed that there is no single agent that has all the information of the domain, instead there are several agents each having access to non-disjoint subsets of the domain variables. Every agent has a causal model, determined by an acyclic causal diagram and a joint probability distribution over its observed variables. We thoroughly study the problems ...
This paper considers weighted kernel functions for support vector machine learning with string data. More precisely, applications that rely on a context of a number of symbols before and after a target symbol will be considered. It will... more
This paper considers weighted kernel functions for support vector machine learning with string data. More precisely, applications that rely on a context of a number of symbols before and after a target symbol will be considered. It will be shown how contexts can be organized in vectors and subsequently, different weighted kernel functions working directly on such contexts will be described. The weighted kernel functions allow to weight every symbol in a context according to its relevance for the determination of the class label of the ...

And 199 more