Name: Survey1 Title: Genetic Algorithm Based On Evolution Strategy and The Alication in Data Mining 2.issue
Name: Survey1 Title: Genetic Algorithm Based On Evolution Strategy and The Alication in Data Mining 2.issue
Name: Survey1 Title: Genetic Algorithm Based On Evolution Strategy and The Alication in Data Mining 2.issue
Issue : 2009 First International Workshop on Education Technology and Computer Science Authors : Xiaoyuan Zhu, Yongquan Yu, Xueyan Guo Department of Computer, Guangdong Baiyun University, Guangzhou, Guangdong, 510450, China E-mail:kobe_zhuxiaoyuan@163.com 3.Objective : The paper brings forward that evolution strategys excellence is applied in genetic algorithms evolutional process. Then optimized genetic algorithm is used for mining association rules
4. Methodology : Shortcomings of traditional GA Traditional genetic algorithm has shortcomings as follows in evolutional process. Individuals, which are produced by mutation, have high fit degree, but have few in amount. Therefore probability that individuals are washed out is great. A majority of individuals of colony locate in coherent state, so it easily results in local convergence of genetic algorithm. Once immersed in local convergence, individual which is close to the best values is always washed out. Whether it is all-around best value cant be judged Improved genetic algorithm Genetic algorithm based on evolution strategy has improvement as follows. Firstlydissimilar degree of individuals is judged in colony when a century has evolved. Dissimilar degree of two random individuals in colony is as follows.
In formula l is the length of gene chain, aj is the number j bit gene in gene chain of individuals a bj is the number j bit gene in gene chain of individuals b In whole colony, dissimilar degree of colony is as follows.
In formula P is colony size. cross probability and mutation probability is set up is as follows.
K is rate that dissimilar degree of colony of last century is compared with current century.
In formulas Pc' Pm' are separately cross probability and mutation probability of last century, Pc Pm are separately cross probability and mutation probability of current century. In this wayevolution of current century is based on last century. Original colony contains excellent individuals of last century. Otherwise partial new individuals are randomly product, and cross probability and mutation probability are newly set up. It can enhance the diversity of colony. 5. Data Set : 2050 groups data of finance service in certain city are regarded as experiment example, detailed data is as
6. Results : Function parameters of genetic algorithm are set up as follows. Original colony has 100 individuals, cross probability is 0.85 mutation probability is 0.02 minimum support-degree is 0.3 minimum confidence-degree is 0.6evolution that it comes by 50 generations is regarded as a century. After 252 generations come by, partial association rules are obtained as table 3.
In traditional Ga results obtained after 850 generations 7. Conclusion : Speed is more. Can be applied to other domains. Also has great research value and application value.
1Name : Survey2 2Title : A Data Mining Based Genetic Algorithm 3.Issue : Proceedings of the Fourth IEEE Workshop on Software Technologies for Future Embedded and Ubiquitous Systems and Second International Workshop on Collaborative Computing, Integration, and Assurance (SEUS-WCCIA06) 4. Authors : YI-TA WU1 , YOO JUNG AN2 , JAMES GELLER2 AND YIH-TYNG WU3 1. Computer Aided Diagnosis Group, Radiology Department, University of Michigan, Ann Arbor, MI 48105 2. Semantic Web & Ontologies Lab, College of Computing Sciences, New Jersey Institute of Technology, Newark, NJ 07102 3. Computer and Information Sciences, Nova Southeastern University, Fort Lauderdale, FL 33314 Contact: yitawu@umich.edu 5.Objective : a data mining-based GA is presented to efficiently improve the Traditional GA (TGA). 6. Methodology : Detailed description of recording the useful information, performing the DNA extraction, and performing the new GA operation (DNA implantation) will be presented
The algorithm of our data mining-based genetic algorithm is described below, and its flowchart is shown in Figure Note that, we separate the whole algorithm into two parts: the traditional GA and the DNA analysis. For simplicity, the Traditional GA and the DNA Analysis are respectively denoted as TGA and DA. Algorithm of Our Data Mining-Based GA: 1. Setup the environment parameters. TGA: Initialize the base population of chromosomes, initialize the ratio of using the GA operations ( 1 r for reproduction, 2 r for crossover, and r3 of mutation, where 1 2 3 1 r r r DA: Initialize the support and confidence arrays; set the DNA pool to be empty. Note that the support and confidence arrays will be introduced in Section 3.1. 2. Evaluate all of the chromosomes based on the fitness function. TGA: Obtain the better chromosomes based on their corresponding fitness values. DA: Record the important gene information for each high quality chromosome by updating the support and the confidence arrays. 3. Recombine new chromosomes based on the traditional GA operations. TGA: Recombine new chromosomes by using reproduction, crossover, and mutation. 4. Recombine new chromosomes based on the data mining-based GA operation. TGA: Check whether the DNA pool is empty or not. If the DNA pool is empty, then go to next step; otherwise, two types of new chromosomes will be generated. Type 1: Randomly select some chromosomes obtained from step 3, and then perform the new GA operation, DNA implantation, to generate new chromosomes. Type 2: Randomly select some chromosomes obtained from step 3, and then disable the genes of the chromosome if the genes appeared in DNA pool. 5. Repeat steps 2 to 5 until any one of following two conditions are reached. Condition 1 (Obtain the optimum solution): The predefined condition is satisfied, i.e. the obtained solution satisfies to our expectation, or a constant number of iterations has been performed.
TGA: Output the current best result as an optimum solution, and terminate the whole GA procedure. Condition 2 (Fall into a DNA trap): The obtained solution does not satisfy our requirements, but there is no improvement after a constant number of new evolutionary generations. Note that the constant number of iterations is much smaller than the one in condition 1. DA: Collect all the important genes based on the support and the confidence to form a new DNA, put the DNA into the DNA pool, and reset the support and confidence arrays. Then go to step 2. 7. Data Set : the watermarking problem to evaluate our data mining-based GA. 8. Results :
9. Conclusion : we have developed a data miming-based GA to reduce the iteration number of the evolutionary generations. The new GA operation, DNA implantation is used to enhance the performance of the traditional GA by providing better evolutionary gene combinations such that the advantages of parent chromosomes can be precisely maintained by their offsprings. Experimental results shows that our data mining-based GA successfully improves the performance of the traditional GA.
Name : Survey3 2Title : Association Rules Data Mining in Manufacturing Information System based on Genetic Algorithms 3.Issue : 2004 3d International Conference on Computational Electromagnetice and Its Applications Proceedings
4. Authors : Cunrong li, Mingzhong Y ang Mechatronics School, Wuhaii UniversiQ of Technology, Wuhan, Hubei Province, China e-mai1:Cunrong-li@l63.com 5.Objective : The genetic algorithm is put into use to mining the association rules orienting to the data set in the Manufwturing Information System (MIS). 6. Methodology : There are three main operators in GA: selection, crossover and mutation. In order to put the GA into use, some analysis should be done. First, it should be tell apart about the individual, the population of individuals and genes in the MIS database. The following step is to implement the Genetic Algorithm, and get the promising results. In the process of data mining the MIS database, some of the history records could be looked as a population of indwiduals, a record as a corresponding individual, and the fields representing the property of the table could be looked as the genes, the correspondence of the items and a table is shown in figure 1.
Basic Genetic Algorithm is divided into six steps: In this article, the application of GA in MIS database A. Encode B. Select the initial population of individuals, from which computed the excellent gene list set Al; selecting the second population of individuals, from which computed the excellent gene list set B1 C. Save the excellent gene list set above to result set R D. Operate the mutation of AI and B1, cross the gene of AI and B1, and generate a new set A2, then select the third population of individuals, from which computed the excellent gene list set B2 E. Repeat the steps from b to c, until there is no new excellent gene Iist F. Decode the result set R and generate the knowledge of association rules 7. Data Set :
In order to analyze the efficiency of GA-based association rules in MIS, a test is proceeded in a server computer: IBM Netfinity(Pentium 1G/512MRAM). 8. Results : Comparison of time used between GA-based and Apn'on' Algorithm The result of experiment in figure 3 shows that the GA-based Association Rule is more efficient than Apriori Algorithm. The time used with the Apriori Algorithm increases sharply with the increase of data amount, otherwise, the other method takes less time. In this experiment, the original support and the original compntibilify are both set to 5%. The results vary with the specific databases.
Experiment for fetching knowledge increases homogeneously when time lasts, and you can exhaust the rules in a predetermined conditions. The amount of knhowledge changing with the time is shown in figure 4. with the time lasts, the amount of knowledge will tend to stable
9. Conclusion : the precision decreased a little, the efficiency increased a lot. When used in Manufacturing Information System, it can be applied in many aspects.
1Name : Survey4 2Title : A Constraint-Based Genetic Algorithm approach for Mining Classification Rules 3.Issue : IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS, VOL. 35, NO. 2, MAY 2005 4. Authors : Chaochang Chiu and Pei-Lun Hsu C. Chiu is with the Department of Information Management, Yuan Ze University, Chungli, Taiwan 320, R.O.C. (e-mail: hiu@saturn.yzu.edu.tw). P.-L. Hsu is with the Department of Electronic Engineering, Ching Yun University Chungli, Taiwan 320, R.O.C. (e-mail: hsupl@cyu.edu.tw). Digital Object Identifier 10.1109/TSMCC.2004.841919 5.Objective : we propose a constraint-based genetic algorithm (CBGA) approach to reveal more accurate and significant classification rules. 6. Methodology : A rule induction system that consists of three modules: the user-interface, the symbol manager, and the constraintbased GA (CBGA). According to Fig. 1, the user interface module allows users to execute the following system operations including: loading a constraint program; adding or retracting the constraints; controlling the GAs parameter settings; monitoring the best solutions. Interesting knowledge or given constraints can be issued by either domain experts or other meta knowledge mechanisms.
7. Data Set : In order to introduce details of the proposed CBGA approach, a synthetic medical data set about patients information is used for illustration. This data set includes the following attributes: age, sex, blood pressure (BP), the status of Cholesterol (Cho), the values of Na and K, and the quantity (Qty) and frequency (Freq) of taking the drug. The prediction attribute is one of the five drug types, including Drug A, Drug B, Drug C, Drug D and Drug E. 8. Results : In comparison with a regular GA, CBGA achieves higher classification accuracy rates in rule inductions for both UCI data sets. In addition, the rule sets discovered by CBGA are not only with higher predictive accuracy, but also with more significant knowledge in accordance to the users preferences. 9. Future : The proposed CBGA is generic and problem independent It is not required to develop any proprietary genetic operator and chromosome representation to interact with the constraint-based reasoning process. Even when the results are available, it is difficult to verify the model reliability
1Name : Survey5 2Title : An Algorithm for Mining Association Rules Based on Improved Genetic Algorithm and its Application 3.Issue : 2009 Third International Conference on Genetic and Evolutionary Computing 4. Authors : Hong GUO (School of Computer & Control, Guilin University of Electronic Technology, Guilin Guangxi 541004, China) e-mail:myhg0@126.com Ya ZHOU (School of Computer & Control, Guilin University of Electronic Technology, Guilin Guangxi 541004, China) e-mail:ccyzhou@gliet.edu.cn 5.Objective : the paper improves the algorithm through adopting an adaptive mutation rate and improving the methods of individual choice, and the improved genetic algorithm that applies to the mining association rules. 6. Methodology : 1) A method of adaptiving mutation rate to avoid excessive variation causing non-convergence, or into a local optimal solution; 2) A sort of individual-based selection method,it will be applied to the evolution of the latter in genetic algorithm, in order to prevent the high-fitness individuals convergencing early by the rapid growth of the number of individual and differences is too small Here, a method of adaptive mutation rate, in the early stages of evolution and mutation rate are used as follows:
Used a phase-out method to improve the choice, applied to the latter part of the genetic algorithm: 1) The size of the fitness of individual choice selection sort; 2) Before the 1/4 copy 2 of the individual, the former 1/4-2/4 part of individual copy 1, enter to the next round of selection; 3)Before the 2/4-3/4 part of the individual reservations, enter to the next round of selection; 4) Before the 3/4-4/4 out part of the individual is no longer into the next round of selection. Description of the Algorithm step 1: the initial population p(0); access to the support S and the confidence C are given by user; step 2: calculating the individual fitness to the current population p(t) ;
step 3: on the current population p(t) to improve the choice of the individual to choose the operate; step 4: cross-operation to the current population p(t); step 5: mutation to the current population p(t); step 6: generate a new population p(t+1); step 7: compare to the termination algebra T, if the the conditions of termination to meet, then terminate and output rules, or to Step 2. 7. Data Set : a database of student achievement in schools in recent years 8. Results : The two curves denote different time cost of the algorithm Apriori and improved genetic algorithm with different minsup. The new algorithm reduces the number of unnecessary operations,streamline the collection of frequent generation and improve the efficiency of the algorithm.
9. Conclusion : gives an improved genetic algorithm for mining association rules for example. The simulation results show the effectiveness and feasibility of algorithms.
1Name : Survey6 2Title : A Novel Genetic Algorithm Based on Image Databases for Mining Association Rules 3.Issue : 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007) 4. Authors :
1 1 2 1
Shangping Dai Li Gao Qiang Zhu Changwu Zhu 1 Department of Computer Science, Hua Zhong Normal University, Wuhan, 430079, P.R.China 2 Zhejiang University of Media and Communication, Hangzhou, 310018, P.R.China spdai@mail.ccnu.edu.cn 5.Objective : A novel spatial mining algorithm, called ARMNGA(Association Rules Mining in Novel Genetic Algorithm) 6. Methodology : Association rules mining Based on a novel Genetic Algorithm Encoding This paper employs natural numbers to encode the variable Aij. That is, the number of the lines of every range in the matrix A in which the element 1 exists is regarded as a gene. The genes are independent of each other. They are marked by A1, A2 Aj, An, in which and Aj[l,m] , j[l,m] and An may be a repeatedly equal natural number The Fitness
Here, WC+Ws=1, Wc 0, Ws 0, Smin, is minimum support, and Cmin is minimum confidence. Reproduction Operator We are adopting roulette selection strategy; each individual reproduction probability is proportion to fitness value. Mutation Operator The selection of the mutation probability is the vital point because it influences the action and performance of the ARMNGA. If is over-small, the ARMNGA will become a pure random research
___ Here, pm1=0.1, pm2=0.001, fmax (X) is the maximum fitness value of the population, f(X) is the average fitness value of the population.
7. Data Set : Image database 8. Results : runtime vs. the minimum support for both algorithms, where the minimum support varies from 0.25% to 2% for the synthetic dataset. Our proposed algorithm runs 25 times faster than the Apriori algorithm, because a large number of candidates can be pruned by using the ARMNGA pruning strategy.
Figure 2 shows the runtime vs. the average size of transactions for both algorithms, where the average size of transactions varies from 4 to 14 for the synthetic dataset. we can deduce that ARMNGA has a higher convergence speed and more reasonable selective scheme which guarantees the non-reduction performance of the optimal solution.
9. Conclusion : it is better than GA and ARM through the theoretic analysis and the experimental results.
1Name : Survey7 2Title : Optimization of Association Rule Mining using Improved Genetic Algorithms 3.Issue : 2004 IEEE International Conference on Systems, Man and Cybernetics 4. Authors : Manish Saggar Ashish Kumar Agrawal Abhimanyu Lad Undergraduate Student Undergraduate Student Undergraduate Student B.TECH4* Year, B.TECH 4" Year, B.TECH 4" Year, Indian Institute of Information Indian Institute of Information Indian Institute of Infm. Technology, Allahabad, India. Technology, Allahahad, India . Technology, Allahabad, India msaggar-01 @iiita.ac.in ashir.h-agr82@yahoo.co.in alad-01 @iiita.ac.in 5.Objective : optimize the rules generated by Association Rule Mining (apriori method), using Genetic Algorithms. 6. Methodology : Genetic Algorithms with Modificationis 1. The individuals are represented using the Michiran s approach, i.e. each individual encodes single rule. 2. Representing the rule antecedent done using binary encoding 3. Generic Operators 4. For selection the authors used Roullete Wheel Sampling procedure is used. 5. Fitness function : Confidence Factor, CF = TP / (TP + FP) Comp = TP / (TP + FN) Fitness = CF x Comp Fitness = wl x (CF x Comp.) + w2 x Simp where Simp is a measure of rule simplicity (normalized to take on values in the range O..l) and wl and w2 are userdefined weights.
6.
7.
Results :
The rules evolved out of the system after application of apriori -association rule minimg and GAS in succession contains some rules with negations in the attributes as predicted and desired.
8. Conclusion : The results generated when the technique applied on the synthetic database, includes the desired rules:. i.e. rules containing the negation of the attributes as well as the general rules evolved from the Association Rule Mining. The authors believe that the toolkit can also handle other databases, after minor modifications. As for future work, the authors are currently working on the complexity reduction of Genetic Algorithms by using distributed computing.
1Name : Survey8 2Title : Immune Optimization based Genetic Algorithm for incremental association rules mining 3.Issue : 2009 International Conference on Artificial Intelligence and Computational Intelligence 4. Authors : Genxiang Zhang Software school of Xiamen University Xiamen, Fujian, PR Chinazhanggenxiang2007@yahoo.cn Haishan Chen Software School of Xiamen University Xiamen, Fujian, PR China hschen@xmu.edu.cn 5.Objective : proposes an IOGA (Immune Optimization based Genetic Algorithm) approach for incremental association rules mining to large and frequent updating data sets. 6. Methodology : We introduce dynamic immune evolution, and biomimetic mechanism] in Engineering Immune Computing (EIC) : immune recognition, immune memory, and immune regulation to GA. Immune recognition is critical in the immune system, its essence is to distinguish self and nonself, and that can be evaluate by affinity between antibodies and antigens 7. Data Set : The experimental data set is from a companys daily records of the APIs (in local computer operation system) which were called by outside files from network, and the results whether the files lead to computer virus. 8. Results : not only can IOGA keep old strong rules which have high support and confidence, but also it can discover new critical rules though their support is not high, 9. Conclusion : First, it overcome the deficiencies in GA and improves the mining efficiency for the process of sampling; second, redundant rules can be avoid base on GA and the rules without insignificant items (or attributes) can be presented to users directly for the process of attributes reduction; finally, new interesting rules with low support in additional data set can be detected while old rules that have high support and confidence can be kept.
1Name : Survey9 2Title : Mining Multi-Class datasets using Genetic Relation Algorithm for Rule Reduction 3.Issue : 2009 IEEE Congress on Evolutionary Computation (CEC 2009) 4. Authors : Eloy Gonzales, Non-Member, IEEE, Shingo Mabu, Member, IEEE, Karla Taboada, Member, IEEE, Kaoru Shimada, Member, IEEE, Kotaro Hirasawa Member, IEEE 5.Objective : This paper describes the use of a new evolutionary method named Genetic Relation Algorithm (GRA) for reducing the number of class association rules extracted by other methods 6. Methodology : Flow chart
7. Data Set : Two datasets from UCI ML Repository [9] were taken to conduct the experiments. Lymphography and Vehicle datasets. 8. Results : It is shown from Fig. 9 and Fig. 10 that when the reduction rate is small, GRA is able to get comparable accuracy to the large set of rules, that is, 100% of the rules, especially in the partial match, furthermore, it is shown that the accuracy does not change drastically compared to the accuracy of the large set of rules
9. Conclusion : The main features of our proposed method are as follows: GRA could be usefully applied to reduce a large set of class association rules and to improve the classification accuracy. Our method could be integrated to other conventional association rule mining methods, since the input could be any large set of rules. The classification accuracy is improved especially when the partial match is used in the classifier. For future work, we plan to extend this method to eliminate redundant rules in general association rules, that is, the association rules with non-fixed consequent (multiple items) ,using the GRA with directed branches
1Name : Survey10 2Title : A Method for Finding Implicating Rules Based on the Genetic Algorithm 3.Issue : Third International Conference on Natural Computation (ICNC 2007) 4. Authors : Zhou Jun Li Shu-you Mei Hong-yan Liu Hai-xia Department of Computer Science, Liaoning University of Technology, 121001, China E-mail:lnjunzhou@163.com 5.Objective : An algorithm for finding optimized rules based on geneticalgorithm is presented. An approach of finding implicating rules based on the genetic algorithm is proposed. 6. Methodology : Genetic Algorithm for Finding Implicating Rules (GAFIR) Algorithm GAFIR Input: Database D , threshold of the strength of implication, the largest evolved algebra GEN , populationsize N, crossover probability Pc, mutation probability Pm Output Rule Set ( RS ) Procedure GAFIR 1. L0 = Initial(D, N) ; 2. TR = GetRules(M, ) 3. For i=1 to GEN 4. Begin 5. C = Crossover(Li-1, Pc ) 6. Li = Mutation(C, pm ) 7.TR= TR U GetRules ( Li ; ) 8. End 9. RS = TR ; 7. Data Set :
we use the algorithm presented above to test performance of the system of the database of the Car Test Results (CTR) 8. Results : it can make out that the interesting rules go to balance, while it evolves about 400 generations. The generation between 1 and 200 is the phase of interesting implication rules that are discovered frequently. Later going to balance, when it comes to 700 generations, it nearly discovers all the interesting rules. The greater threshold of fitness is, the less number of interesting implication rules distilled. On the contrary, the smaller threshold of fitness is, the more number of interesting implication rules distilled. 9. Conclusion : This paper gives a method to judge relativity and correlative degree of rules with implication analysis. It can find implicating rule, judge and denote concurrent rules. It also can cut out the searching space of rules with degree of fitness defined based on the strength of implication and predigests measurement of rules.
1Name : Survey11 2Title : Efficient Distributed Genetic Algorithm for Rule Extraction 3.Issue : Eighth International Conference on Hybrid Intelligent Systems 2008 4. Authors : Antonio Peregrin Miguel Angel Rodriguez Dept. of Information Technologies Dept. of Information Technologies University of Huelva University of Huelva peregrin@dti.uhu.es miguel.rodriguez@dti.uhu.es 5.Objective : This paper presents an efficient distributed genetic algorithm for classification rules extraction in data mining, which is based on a new method of dynamic data distribution applied to parallelism using networks of computers in order to mine large datasets. 6. Methodology :
EDGAR uses a local GA in each node with some communications with the neighborhood for individuals and poorly covered examples. Generate initial population using seeding While (Stop Criteria) For a number of generations Select g individuals by US For each individual If % Perform recombination If % Perform mutation end replace g individual from population Exchange individuals Exchange training examples end end Extract set of rules by greedy algorithm Send set of rules to Central Pool If (not improving) reduce training data End 7. Data Set : For the experimental study, a well known problem has been chosen from UCI [16]: Nursery. This dataset has 12.960 instances, big enough to test data distribution. Nursery is a complex dataset with 6 characteristics and 5 not balanced classes, representing three of them more than 97% of the dataset 8. Results :
we can point out: The time of execution (see figure 4) of the proposed has a considerable speedup and a better behavior than the compared algorithm when the number of processors increases. Classification accuracy is similar in both algorithms and does not follow any tendency relative to the number of processors The number of rules generated is between 60% and 80% smaller in EDGAR.
9. Conclusion : In this preliminary study EDGAR shows a considerable speed up and even more, this improvement does not compromised the accuracy and quality of the classifier. Finally, we would like to remark the absence of a master process to guide the search.
1Name : Survey12 2Title : Searching for Rules to find Defective Modules in Unbalanced Data Sets 3.Issue : 2009 International Symposium on Search Based Software Engineering 4. Authors : D. Rodrguez J.C. Riquelme R. Ruiz, J.S. Aguilar-Ruiz Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science University of Alcala University of Seville Pablo de Olavide University Alcala de Henares, Madrid, Spain Seville, Spain Seville, Spain daniel.rodriguezg@uah.es riquelme@us.es robertoruizaguilar}@upo.es 5.Objective : In this work, we use data mining techniques to search for rules that indicate modules with a high probability of being defective. Using data sets from the PROMISE repository 6. Methodology : we first applied feature selection (attribute selection) to work only with those attributes from the data sets capable of predicting defective modules. With the reduced data set, a genetic algorithm is used to search for rules characterising modules with a high probability of being defective. 7. Data Set : the CM1, KC1, KC2, and PC1 data sets available in the PROMISE repository are used 8. Results :
9. Conclusion :