Testdocu
Testdocu
Testdocu
Data mining is a process which finds useful patterns from large amount of data. The paper discusses few of the data mining
techniques, algorithms and some of the organizations which have adapted data mining technology to improve their businesses and
found excellent results.
Keywords: Data mining Techniques; Data mining algorithms; Data mining applications.
1. Overview of Data Mining
The development of Information Technology has generated large amount of databases and huge data in
various areas. The research in databases and information technology has given rise to an approach to store
and manipulate this precious data for further decision making. Data mining is a process of extraction of
useful information and patterns from huge data. It is also called as knowledge discovery process,
knowledge mining from data, knowledge extraction or data /pattern analysis.
Figure 1. Knowledge discovery Process
Data mining is a logical process that is used to search through large amount of data in order to find
useful data. The goal of this technique is to find patterns that were previously unknown. Once these
patterns are found they can further be used to make certain decisions for development of their businesses.
Three steps involved are
Exploration
Pattern identification
Deployment
Exploration: In the first step of data exploration data is cleaned and transformed into another form, and
important variables and then nature of data based on the problem are determined.
Pattern Identification: Once data is explored, refined and defined for the specific variables the second step
is to form pattern identification. Identify and choose the patterns which make the best prediction.
Deployment: Patterns are deployed for desired outcome.
2. Data Mining Algorithms and Techniques
Various algorithms and techniques like Classification, Clustering, Regression, Artificial
Intelligence, Neural Networks, Association Rules, Decision Trees, Genetic Algorithm, Nearest Neighbor
method etc., are used for knowledge discovery from databases.
2.1. Classification
Classification is the most commonly applied data mining technique, which employs a set of pre-classified
examples to develop a model that can classify the population of records at large. Fraud detection and creditrisk
applications are particularly well suited to this type of analysis. This approach frequently employs
decision tree or neural network-based classification algorithms. The data classification process involves
learning and classification. In Learning the training data are analyzed by classification algorithm. In
classification test data are used to estimate the accuracy of the classification rules. If the accuracy is
acceptable the rules can be applied to the new data tuples. For a fraud detection application, this would
include complete records of both fraudulent and valid activities determined on a record-by-record basis.
The classifier-training algorithm uses these pre-classified examples to determine the set of parameters
required for proper discrimination. The algorithm then encodes these parameters into a model called a
classifier.
Types of classification models:
Classification by decision tree induction
Bayesian Classification
Neural Networks
Support Vector Machines (SVM)
Classification Based on Associations