A method for the data mining task of data classification, suitable to be implemented on massively... more A method for the data mining task of data classification, suitable to be implemented on massively parallel architectures, is proposed. The method combines genetic programming and simulated annealing to evolve a population of decision trees. A cellular automaton is used to realise a fine-grained parallel implementation of genetic programming through the diffusion model and the annealing schedule to decide the acceptance of a new solution. Preliminary experimental results, obtained by simulating the behaviour of the cellular ...
A parallel genetic programming approach to induce decision trees in large data sets is presented.... more A parallel genetic programming approach to induce decision trees in large data sets is presented. A population of trees is evolved by employing the genetic operators and every individual is evaluated by using a fitness function based on the J-measure. The method is able to deal with large data sets since it uses a parallel implementation of genetic programming through the grid model and an out of core technique for those data sets that do not fit in main memory. Preliminary experiments on data sets from the UCI machine learning repository give good classification outcomes and assess the scalability of the method
An extension of Cellular Genetic Programming for data classifiation to induce an ensemble of pred... more An extension of Cellular Genetic Programming for data classifiation to induce an ensemble of predictors is presented. Each classifier is trained on a different subset of the overall data, then they are combined to classify new tuples by applying a simple majority voting algorithm, like bagging. Preliminary results on a large data set show that the ensemble of classifiers trained on a sample of the data obtains higher accuracy than a single classifier that uses the entire data set at a much lower computational cost.
An extension of Cellular Genetic Programming for data classification with the boosting technique ... more An extension of Cellular Genetic Programming for data classification with the boosting technique is presented and a comparison with the bagging-like majority voting approach is performed. The method is able to deal with large data sets that do not fit in main memory since each classifier is trained on a subset of the overall training data. Experiments showed that, by using a sample of reasonable size, the extension with these voting algorithms enhances classification accuracy at a much lower computational cost.
In this paper an intrusion detection algorithm based on GP ensembles is proposed. The algorithm r... more In this paper an intrusion detection algorithm based on GP ensembles is proposed. The algorithm runs on a distributed hybrid multi-island model-based environment to monitor security-related activity within a network. Each island contains a cellular genetic program whose aim is to generate a decision-tree predictor, trained on the local data stored in the node. Every genetic program operates cooperatively, yet independently by the others, by taking advantage of the cellular model to exchange the outmost individuals of the population. After the classifiers are computed, they are collected to form the GP ensemble. Experiments on the KDD Cup 1999 Data show the validity of the approach.
IEEE Transactions on Applications and Industry, 1998
A new parallel hybrid method for solving the satisfiability problem that combines cellular geneti... more A new parallel hybrid method for solving the satisfiability problem that combines cellular genetic algorithms and the random walk (WSAT) strategy of GSAT is presented. The method, called CGWSAT, uses a cellular genetic algorithm to perform a global search on a random initial population of candidate solutions and a local selective generation of new strings. Global search is specialized in local search by adopting the WSAT strategy. CGWSAT has been implemented on a Meiko CS-2 parallel machine using a two-dimensional cellular automaton as a parallel computation model. The algorithm has been tested on randomly generated problems and some classes of problems from the DIMACS test set
A new parallel implementation of genetic programming based on the cellular model is presented and... more A new parallel implementation of genetic programming based on the cellular model is presented and compared with the island model approach. Although the widespread belief that cellular model is not suitable for parallel genetic programming implementations, experimental results show a better convergence with respect to the island approach, a good scale-up behaviour and a nearly linear speed-up.
Some steps of the design of a data dictionary with the use of a particular methodology are repres... more Some steps of the design of a data dictionary with the use of a particular methodology are represented by means of logic rules augumented with integrity constraints defining illegal data design. The presence of concepts incompatible among them is easily revealed by asking for satisfiability of integrity constraints. Furthermore, it is possible to obtain the hypotheses explaining the presence of illegality by exploiting abductive reasoning. To this end a new proposal for the computation of such hypotheses, based on an suitable manipulation of minimal three-valued models of the logic program, is presented.
A method for the data mining task of data classification, suitable to be implemented on massively... more A method for the data mining task of data classification, suitable to be implemented on massively parallel architectures, is proposed. The method combines genetic programming and simulated annealing to evolve a population of decision trees. A cellular automaton is used to realise a fine-grained parallel implementation of genetic programming through the diffusion model and the annealing schedule to decide the acceptance of a new solution. Preliminary experimental results, obtained by simulating the behaviour of the cellular ...
A parallel genetic programming approach to induce decision trees in large data sets is presented.... more A parallel genetic programming approach to induce decision trees in large data sets is presented. A population of trees is evolved by employing the genetic operators and every individual is evaluated by using a fitness function based on the J-measure. The method is able to deal with large data sets since it uses a parallel implementation of genetic programming through the grid model and an out of core technique for those data sets that do not fit in main memory. Preliminary experiments on data sets from the UCI machine learning repository give good classification outcomes and assess the scalability of the method
An extension of Cellular Genetic Programming for data classifiation to induce an ensemble of pred... more An extension of Cellular Genetic Programming for data classifiation to induce an ensemble of predictors is presented. Each classifier is trained on a different subset of the overall data, then they are combined to classify new tuples by applying a simple majority voting algorithm, like bagging. Preliminary results on a large data set show that the ensemble of classifiers trained on a sample of the data obtains higher accuracy than a single classifier that uses the entire data set at a much lower computational cost.
An extension of Cellular Genetic Programming for data classification with the boosting technique ... more An extension of Cellular Genetic Programming for data classification with the boosting technique is presented and a comparison with the bagging-like majority voting approach is performed. The method is able to deal with large data sets that do not fit in main memory since each classifier is trained on a subset of the overall training data. Experiments showed that, by using a sample of reasonable size, the extension with these voting algorithms enhances classification accuracy at a much lower computational cost.
In this paper an intrusion detection algorithm based on GP ensembles is proposed. The algorithm r... more In this paper an intrusion detection algorithm based on GP ensembles is proposed. The algorithm runs on a distributed hybrid multi-island model-based environment to monitor security-related activity within a network. Each island contains a cellular genetic program whose aim is to generate a decision-tree predictor, trained on the local data stored in the node. Every genetic program operates cooperatively, yet independently by the others, by taking advantage of the cellular model to exchange the outmost individuals of the population. After the classifiers are computed, they are collected to form the GP ensemble. Experiments on the KDD Cup 1999 Data show the validity of the approach.
IEEE Transactions on Applications and Industry, 1998
A new parallel hybrid method for solving the satisfiability problem that combines cellular geneti... more A new parallel hybrid method for solving the satisfiability problem that combines cellular genetic algorithms and the random walk (WSAT) strategy of GSAT is presented. The method, called CGWSAT, uses a cellular genetic algorithm to perform a global search on a random initial population of candidate solutions and a local selective generation of new strings. Global search is specialized in local search by adopting the WSAT strategy. CGWSAT has been implemented on a Meiko CS-2 parallel machine using a two-dimensional cellular automaton as a parallel computation model. The algorithm has been tested on randomly generated problems and some classes of problems from the DIMACS test set
A new parallel implementation of genetic programming based on the cellular model is presented and... more A new parallel implementation of genetic programming based on the cellular model is presented and compared with the island model approach. Although the widespread belief that cellular model is not suitable for parallel genetic programming implementations, experimental results show a better convergence with respect to the island approach, a good scale-up behaviour and a nearly linear speed-up.
Some steps of the design of a data dictionary with the use of a particular methodology are repres... more Some steps of the design of a data dictionary with the use of a particular methodology are represented by means of logic rules augumented with integrity constraints defining illegal data design. The presence of concepts incompatible among them is easily revealed by asking for satisfiability of integrity constraints. Furthermore, it is possible to obtain the hypotheses explaining the presence of illegality by exploiting abductive reasoning. To this end a new proposal for the computation of such hypotheses, based on an suitable manipulation of minimal three-valued models of the logic program, is presented.
Uploads
Papers by Clara Pizzuti