Data Mining Chapter 1
Data Mining Chapter 1
Sift through all the chaotic and repetitive noise in your data.
Understand what is relevant and then make good use of that
information to assess likely outcomes.
Accelerate the pace of making informed decisions.
Learn more about data mining techniques in Data Mining From A to Z, a
paper that shows how organizations can use predictive analytics and data
mining to reveal new insights from data.
1.3 What Kinds of Data Can Be Mined?
As a general technology, data mining can be applied to any kind of
data as long as the data are meaningful for a target application. The
most basic forms of data for mining applications are database data
(Section 1.3.1), data warehouse data (Section 1.3.2), and
transactional data (Section 1.3.3). The concepts and techniques
presented in this book focus on such data. Data mining can also be
applied to other forms of data (e.g., data streams, ordered/sequence
data, graph or networked data, spatial data, text data, multimedia
data, and the WWW). We present an overview of such data in Section
1.3.4. Techniques for mining of these kinds of data are briefly
introduced in Chapter 13.
1.4What Kinds of Patterns Can Be Mined?
We have observed various types of data and information repositories on
which data mining can be performed. Let us now examine the kinds of
patterns that can be mined. There are a number of data mining
functionalities. These include characterization and discrimination
(Section 1.4.1); the mining of frequent patterns, associations, and
correlations(Section1.4.2);classificationandregression(Section1.4.3);clus
teringanalysis (Section 1.4.4); and outlier analysis (Section 1.4.5). Data
mining functionalities are used to specify the kinds of patterns to be
found in data mining tasks. In general, such tasks can be classified into
two categories: descriptive and predictive. Descriptive mining tasks
characterize properties of the data in a target data set. Predictive mining
tasks perform induction on the current data in order to make predictions.
Dataminingfunctionalities,andthekindsofpatternstheycandiscover,aredesc
ribed below. In addition, Section 1.4.6 looks at what makes a pattern
interesting. Interesting patterns represent knowledge.KDD Process in
Data Mining
Last Updated: 20-08-2019
Data Mining – Knowledge Discovery in Databases(KDD).
Why we need Data Mining?
Volume of information is increasing everyday that we can handle from
business transactions, scientific data, sensor data, Pictures, videos, etc.
So, we need a system that will be capable of extracting essence of
information available and that can automatically generate report,
views or summary of data for better decision-making.
Why Data Mining is used in Business?
Data mining is used in business to make better managerial decisions by:
Automatic summarization of data
Extracting essence of information stored.
Discovering patterns in raw data.
Data Mining also known as Knowledge Discovery in Databases, refers
to the nontrivial extraction of implicit, previously unknown and
potentially useful information from data stored in databases.
Steps Involved in KDD Process:
KDD process
Unsupervised Learning
In unsupervised learning, the information used to train is neither classified nor
labelled in the dataset. Unsupervised learning studies on how systems can
infer a function to describe a hidden structure from unlabelled data. The main
task of unsupervised learning is to find patterns in the data.
Once a model learns to develop patterns, it can easily predict patterns for any
new dataset in the form of clusters. The system doesn’t figure out the right
output, but it explores the data and can draw inferences from datasets to
describe hidden structures from unlabeled data.
Reinforcement Learning
It is a Machine Learning algorithm that allows software agents and machines
to automatically determine the ideal behaviour within a specific context to
maximize its performance. It does not have labelled dataset or results
associated with data so the only way to perform a given task is to learn from
experience.
For every correct action or decision of algorithm, it is rewarded with positive
reinforcement whereas, for every incorrect action, it is rewarded with negative
reinforcement. In this way, it learns which actions are needed to perform and
which are not. Reinforcement learning can, therefore, help in industrial
automation as well as the gaming sector primarily.
Data mining is not an easy task, as the algorithms used can get very
complex and data is not always available at one place. It needs to be
integrated from various heterogeneous data sources. These factors also
create some issues. Here in this tutorial, we will discuss the major issues
regarding −
Performance Issues
There can be performance-related issues such as follows −
Efficiency and scalability of data mining algorithms − In order to effectively
extract the information from huge amount of data in databases, data mining
algorithm must be efficient and scalable.
Parallel, distributed, and incremental mining algorithms − The factors such
as huge size of databases, wide distribution of data, and complexity of data
mining methods motivate the development of parallel and distributed data
mining algorithms. These algorithms divide the data into partitions which is
further processed in a parallel fashion. Then the results from the partitions is
merged. The incremental algorithms, update databases without mining the data
again from scratch.