DM Unit1 Intro

DATA MINING
UNIT -1
INTRODUCTION
By
We are living in the information age means actually living in the data age.
Metabytes , petabytes of data pour in computer networks, world wide web and various data
storage devices every day.. from business, society, Science and Engineering , Medicine &
almost every other aspect of daily life..
In short: Searching for Interesting Patterns from Data. (Knowledge).

Data Mining id the process of discovering interesting patterns and knowledge from large
amounts of data..
Data mining refers to extraction or mining knowledge from large data bases. Data mining and
knowledge discovery in the database is a new interdisciplinary field, merging ideas from
statistics, machine learning, databases and parallel computing.
Knowledg
Fig: Data mining—searching for knowledge (interesting patterns) in data
The Data source can include databases, data warehouse and web other information
repositories or data that are streamed into the system dynamically.
1. Data Mining is the non –trivial extraction of implicit, previously unknown and
potentially useful information from the data.
2. Data Mining is the search for the relationship and global patterns that exist in large
databases but are hidden among vast amounts of data.
3. Data Mining refers to using variety of techniques to identify nuggets of information or
decision-making knowledge in the database and extracting these in such a way that they
can be put to use in areas such as decision support, prediction, forecasting and
estimation .
4. Data mining is the processe of discovering meaningful new correlation pattern and trends
by shifting through large amount of data stored in repositories, using pattern recognition
techniques as well as statistical and mathematical techniques.
Data Mining also termed as ,

Knowledge extraction,
knowledge Mining from data,
KDD - Knowledge discovery from Data,
Data Archaeology and Data Dredging.
Also called as Data / Pattern Analysis.
Volume of information is increasing everyday that we can handle from business transactions,
scientific data, sensor data, Pictures, videos, etc. So, we need a system that will be capable of
extracting essence of information available and that can automatically generate report, views
or summary of data for better decision-making.
 Automatic summarization of data

 Extracting essence of information stored.
 Discovering patterns in raw data.
Data Minng as a synonym for another popularly used term, Knowledge discovery from data
or KDD. Data mining as merely an essential step in the process of knowledge discovery. The
knowledge discovery process is an iterative sequence of the following steps:
1. : (To remove noise and Inconsistent data)
2. (Where multiple data sources may be combined)
3. (Where data relevant to the analysis task are retrieved from the database)
4. (Where data are transformed and consolidated into forms
appropriate for mining by performing summary or aggregation operations)
5. (Where intelligent methods are applied to extract data patterns)
6. : (To identify the truly interesting patterns representing knowledge
based on interestingness measures)
7. ( Where visualization and knowledge representation
techniques are used to present mined knowledge to users)
- Any kind of data from
- Data Bases
- Datawarehouse
- Transactional Data
Other forms of data,
- Data Streams
- Ordered / Sequence Data
- Graph / Networked Data
- Text Data
- Spatial Data
- Multimedia Data and
- Web Data (www).
Data Mining techniques support automatic exploration of data and attempts to source
out patterns and trends in the data and also rules from these patterns which will help the user
to support review and examine decisions in some related business or scientific area.
We have observed various types of data and information repositories on which data
mining can be performed and kinds of patterns can be mined.
There are a number of data mining functionalities.

Includes : Characterization and discrimination
The mining of frequent patterns, associations and correlations
Classification and Regression
Clustering analysis and
Outlier Analysis.
DM functionalities are used to specify the kinds of patterns to be found in data
mining tasks. Such tasks are classified into two categories : Descriptive and Predictive.
tasks characterize properties of the data in a target data set.
tasks perform induction on the current data in order to make
predictions.
Descriptions of a individual classes or a concepts in summarized, concise and precise

terms called class/ concept descriptions. These descriptions can be divided via
1. Data Characterization
2. Data Discrimination
3. Both data characterization and discrimination.
 It is a summarization of the general characteristics of a target class of data.

 The data corresponding to the user specified class are collected by a database
query.
The output of data characterization can be presented in various forms like

 Pie charts
 Bar charts
 Curves
 Multidimensional cubes
 Multidimensional tables etc.
The resulting descriptions can be presented as generalized relations or in rule

forms called characteristic rules.
Comparison of the general features of target class data objects with the general
features of objects from one or a set of contrasting classes.
The output of data discrimination can be presented in the same manner as data
characterization. Discrimination descriptions expressed in rule form are referred to as
discriminate rules
Association Analysis :
Association analysis is most widely used to discover hidden patterns in large data sets.
These hidden and uncovered relationships can be represented in the form of association
rules or sets of frequent items. The role of identifying interesting associations in large
databases is correlation analysis. There can be two types of these enthralling relationships:
frequent itemsets or rules of the association. Frequent object sets are a collection of objects
that mostly take place together. Association rules are the method of viewing fascinating
relationships. The rules of association show that a close bond occurs between two or more
objects.
Example : Market Basket Analysis shown below.
Classification Analysis :
Classification is the process of finding a model (or function) that describes and
distinguishes data classes or concepts.
- Models are derived based on the analysis of a set of training data.

- Model is used to predict the class label of objects for which class label is unknown.
- Model is represented in various forms
- Classification rule (IF- THEN )
- Decision Trees (Flow chart - like tree structure)
- Mathematical Formulae or Neural Networks.
Clustering Analysis:
Analyzing data objects without consulting class labels. Clustering can be used to
generate class labels for a group of data.
Clustered or grouped based on the principles of maximizing the intraclass similarity

and minimizing the interclass similarity.
Outlier Analysis :
A data set may contain objects that do not comply with the general behaviour or
model of the data. These objects are outliers. Many data mining methods discard outliers as
noise or exceptions.
In some applications (e.g fraud detection) the rare events can be more interesting than
the more regularly occurring ones. The analysis of outlier data is referred to as outlier analysis
or anomaly mining.
Data Mining is considered as an interdisciplinary field. It includes a set of various

disciplines such as statistics, database systems, machine learning, visualization and
information sciences.Classification of the data mining system helps users to understand the
system and match their requirements with such systems.
To Understand the system and meet the desired requirements data mining can be
classified into the above systems. Classification based on
- Classification based on the types of Databases mined..

- Classification according to the application adapted
- Classification according to the type of techniques utilized
- Classification according to the types of knowledge mined
Data mining task can be specified in the form of a data mining query, which is input
to the data mining system. A data mining query is defined in terms of data mining tasks
primitives.
These primitives allow user to interactively communicate with the data mining
system during the discovery to direct the mining process or examine the findings from
different angles or depths.
The data mining primitives Specify ,
1. Set of task relevant data to be mined.
2. Kind of knowledge to be mined.
3. Background knowledge to be used in the discovery process.
4. Interestingness measures and thresholds for pattern evaluation.
5. Representation for visualizing the discovered patterns.
Mining Methodology
- Mining various and new kinds of knowledge
- Mining knowledge in multi-dimensional space
- Data Mining
- Boosting the power of discovery in a networked environment.
- Handling uncertainty, noise or incompleteness of data
- Pattern evaluation & pattern or constraint guided mining.
User Interaction
- Interactive Mining
- Incorporation of background knowledge
- Adhoc data mining & data mining query languages.
- Presentation & visualization of data mining results.
Efficiency & Scalability
- Efficiency & scalability of data mining algorithms.
- Parallel, distributed & incremental mining algorithms.
Diversity of Database Types
- Handling complex types of data.
- Mining dynamic, networked & global data repositories.
Data Mining & Society
- Social impacts of data mining
- Privacy preserving data mining
- Invisible data mining.
Thank You

DM Unit1 Intro

Uploaded by

Copyright:

Available Formats

DM Unit1 Intro

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DM Unit1 Intro

Uploaded by

Copyright:

Available Formats

DATA MINING

In short: Searching for Interesting Patterns from Data. (Knowledge).

Fig: Data mining—searching for knowledge (interesting patterns) in data

Data Mining also termed as ,

 Automatic summarization of data

There are a number of data mining functionalities.

Descriptions of a individual classes or a concepts in summarized, concise and precise

 It is a summarization of the general characteristics of a target class of data.

The output of data characterization can be presented in various forms like

The resulting descriptions can be presented as generalized relations or in rule

- Models are derived based on the analysis of a set of training data.

Clustered or grouped based on the principles of maximizing the intraclass similarity

Data Mining is considered as an interdisciplinary field. It includes a set of various

- Classification based on the types of Databases mined..

You might also like