Data Mining Using Evolutionary Algorit Data Mining Using Evolutionary Algorithm HM
Data Mining Using Evolutionary Algorit Data Mining Using Evolutionary Algorithm HM
Data Mining Using Evolutionary Algorit Data Mining Using Evolutionary Algorithm HM
Submitted by:
Name: Apar Parajuli
Group: CE
Roll no.: 37
Level: 4thyr / I sem.
Submitted to:
KATHMANDU UNIVERSITY
Department of Computer Science and Engineering
Dhulikhel, Kavre
Abstract
With the huge amount of data being generated in the world every day, at a rate far higher than
by which it can be analyzed by human comprehension alone, data mining becomes an extremely
important task for extracting as much useful information from this data as possible. The
standard data mining techniques are satisfactory to a certain extent but they are constrained by
certain limitations, and it is for these cases that evolutionary approaches are both more capable
and more efficient. In this paper I present the use of nature inspired evolutionary techniques to
data mining augmented with human interaction to handle situations for which concept
definitions are abstract and hard to define, hence not quantifiable in an absolute sense. Finally, I
propose some ideas for these techniques for future implementations.
1
CONTENTS
S.N. Page
Abstract 1
1. Introduction 3-4
2. Objectives 4
3. Methodology 4
4. Discussion 5-9
5. Conclusion 10
2
1 Introduction
In recent years, the massive growth in the amount of stored data has increased the demand
for effective data mining methods to discover the hidden knowledge and patterns in these
data sets. Data mining means to “mine” or extract relevant information from any available
data of concern to the user. Data mining is not a new technique but has been around for
centuries and has been used for problems like regression analysis, or knowledge discovery
from records of various types. As computers invaded almost all conceivable fields of
human knowledge and occupation, their advantages were advocated all over, but what was
observed soon enough was that with the increasing amounts of data that could be generated,
stored and analysed there was a need to define some way to sift through it and grab the
important stuff out. During the earlier days a human or a group of humans would sit down
to analyse the data by going through it manually and using statistical techniques, but the
curve of data generation was far steeper than what could realistically be processed by hand.
This led to the emergence of the field of data mining, which was essentially to define and
formalize standard techniques to extract data from large data warehouses. As data mining
evolved it was observed that the data at hand was almost always never perfect or suitable to
be fed to data mining engines and needed several steps of pre-processing before it could be
put through “mining”. Generally these inconsistencies would be in data format, level of
noise or incorrect data, unnecessary data, redundant data etc. These steps would clean,
integrate, discretize and select the most relevant attributes before performing any mining. A
whole new area called Intelligent data analysis has emerged which utilises efficient
techniques for mining data from large sets keeping in mind that the knowledge obtained is
useful at the same time also remembering that time for mining is constrained and the user
requires data as soon as possible. Some of the methods used to mine data include support
vector machines, decision trees, nearest neighbour analysis, Bayesian classification, and
latent semantic analysis. With the problems associated with conventional data mining
techniques, clever new ways to overcome these were needed, and the application of AI
techniques to the field resulted in a very powerful hybrid of techniques. Evolutionary
optimization techniques provided with a useful and novel solution to these issues, and once
data mining was enhanced with using EC many of the previously mentioned problems were
no longer big issues. Some of applications of evolutionary algorithms in data mining, which
involves human interaction, are presented in this paper. When dealing with concepts that
are abstract and hard to define or cases where there are a large or variable number of
parameters, we still do not have reliable methods for finding solutions. For certain cases
where we are unable to quantify what we want to measure, for instance ‘beauty’ in images
or ‘pleasantness’ in music, we almost always require a human to drive the solutions through
his choices. In these situations we use a combination of Evolutionary computation along
with data mining but with a human sitting and interacting with the engine to steer the
computation towards solutions or answers he is looking for. This paper begins by
3
describing some concepts in data mining and general evolutionary algorithms by giving
relevant concepts and descriptions. In the later sections we discuss some of the areas where
these are implemented and lastly we give a few ideas of where these techniques may be
implemented in the future.
2 Objectives
The main objectives of this report include:
3 Methodology
The most effective way to collect data on the chosen subject matter, that I found, was
through web. Google turned out to be very helpful for understanding the concept of data
mining. knowledge discovery and all kinds of Evolutonary algorithm.
4
4 Discussion
Data integration: removes redundant and inconsistent data from data that is
collected from different sources.
Attribute selection: selects the relevant data to the analysis process from all the
data sets.
Data mining: after doing all the previous steps, data mining algorithms or
techniques can be applied to the data in order to extract the desirable knowledge
5
4.3 Data mining tasks
It is very important to define the data mining task that the algorithm should address before
designing it for application to a particular problem. There are several tasks of data mining
and each of them has specific purposes in terms of the knowledge to be discovered
(Freitas,2002).
Clustering Task
Clustering simply means grouping, placing data instances into different groups or
clusters such that instances from the same clusters are similar together and easily
distinguished from the instances that belong to the other clusters (Zaki et al., 2010).
6
occurrence together (Tan et al., 2006). This means if there is an item A in the basket
then there is a high probability that item B will be in the basket as well.
7
Figure 1
The relation in concept relation dictionary is like a rule and can be acquired by inductive
learning if training
raining examples are available, to do so words are extracted from the document
by lexical analysis and these words are checked if they match
ma h a expression in key concept
dictionary. Thus we have the following assumptions, concept classes are attributes,
concepts are values and test classes given by the reader are the result
result classes we want, this
forms a training example. Also for all those attributes, which do not have values, 0 is
assigned. An overview of this is clearly depicted in the figure below
Figure 2
For the inductive learning to work we need a fuzzy algorithm, as reports, which are
written by humans, are not strict in accordance with descriptions. Thus the method
described for the learning is the IDF algorithm, which is a fuzzy algorithm. This
algorithm makes rules from the generated training examples and the rules, which are
generated, have the genotype of a tree.
The whole process can be seen in figure 3 below which shows the inputs, and the
processes, which go into getting the final outputs from the input dictionaries and data.
Figure 3
The algorithm was tested on daily reports for a business concerning retail sales into 3
classes concerned with describing a sales opportunity as best, missed or other. The key
concept dictionary was composed of 13 concept classes and each concept class has its
subset of concepts. Those reports which contained contradicting descriptions were
regarded as unnecessary and training example from them were not generated. And the
results showed that by using 10 fold cross validation they were successfully able to
generate the concept relation dictionary and obtain better results than IDF on the reports
generated for retailing.
5 Conclusion
Data mining in todays world has a lot of new use and has become very crucial in many
fields. Huge amount of data has been accumulated from which a lot of useful patterns can
be found. This large search space is where evolutionary algorithm comes into work, since it
can search for patterns from these data in a very effcient way. Many conventional methods
are not as efficient as this approach. That is why using evoloutioary algorithms in data
mining can be really helpful to use and learn from data in a very efficient way .
6 Refrences
http://Wikipedia.org/evolutionaryalgorithm
http://Wikipedia.org/datamining
http://Wikipedia.org/datamining/dataminingtasks
http://aitopics.org/machinelearning/geneticalgorithm
Freitas, A. A. (2002). Data Mining and Knowledge Discovery with Evolutionary
Algorithms. Berlin: Springer-Verlag.
10