DM Unit1 Intro
DM Unit1 Intro
DM Unit1 Intro
UNIT -1
INTRODUCTION
By
We are living in the information age means actually living in the data age.
Metabytes , petabytes of data pour in computer networks, world wide web and various data
storage devices every day.. from business, society, Science and Engineering , Medicine &
almost every other aspect of daily life..
Data mining refers to extraction or mining knowledge from large data bases. Data mining and
knowledge discovery in the database is a new interdisciplinary field, merging ideas from
statistics, machine learning, databases and parallel computing.
Knowledg
The Data source can include databases, data warehouse and web other information
repositories or data that are streamed into the system dynamically.
1. Data Mining is the non –trivial extraction of implicit, previously unknown and
potentially useful information from the data.
2. Data Mining is the search for the relationship and global patterns that exist in large
databases but are hidden among vast amounts of data.
3. Data Mining refers to using variety of techniques to identify nuggets of information or
decision-making knowledge in the database and extracting these in such a way that they
can be put to use in areas such as decision support, prediction, forecasting and
estimation .
4. Data mining is the processe of discovering meaningful new correlation pattern and trends
by shifting through large amount of data stored in repositories, using pattern recognition
techniques as well as statistical and mathematical techniques.
Volume of information is increasing everyday that we can handle from business transactions,
scientific data, sensor data, Pictures, videos, etc. So, we need a system that will be capable of
extracting essence of information available and that can automatically generate report, views
or summary of data for better decision-making.
Data Minng as a synonym for another popularly used term, Knowledge discovery from data
or KDD. Data mining as merely an essential step in the process of knowledge discovery. The
knowledge discovery process is an iterative sequence of the following steps:
1. : (To remove noise and Inconsistent data)
2. (Where multiple data sources may be combined)
3. (Where data relevant to the analysis task are retrieved from the database)
4. (Where data are transformed and consolidated into forms
appropriate for mining by performing summary or aggregation operations)
5. (Where intelligent methods are applied to extract data patterns)
6. : (To identify the truly interesting patterns representing knowledge
based on interestingness measures)
7. ( Where visualization and knowledge representation
techniques are used to present mined knowledge to users)
- Any kind of data from
- Data Bases
- Datawarehouse
- Transactional Data
Other forms of data,
- Data Streams
- Ordered / Sequence Data
- Graph / Networked Data
- Text Data
- Spatial Data
- Multimedia Data and
- Web Data (www).
Data Mining techniques support automatic exploration of data and attempts to source
out patterns and trends in the data and also rules from these patterns which will help the user
to support review and examine decisions in some related business or scientific area.
We have observed various types of data and information repositories on which data
mining can be performed and kinds of patterns can be mined.
Comparison of the general features of target class data objects with the general
features of objects from one or a set of contrasting classes.
The output of data discrimination can be presented in the same manner as data
characterization. Discrimination descriptions expressed in rule form are referred to as
discriminate rules
Association Analysis :
Association analysis is most widely used to discover hidden patterns in large data sets.
These hidden and uncovered relationships can be represented in the form of association
rules or sets of frequent items. The role of identifying interesting associations in large
databases is correlation analysis. There can be two types of these enthralling relationships:
frequent itemsets or rules of the association. Frequent object sets are a collection of objects
that mostly take place together. Association rules are the method of viewing fascinating
relationships. The rules of association show that a close bond occurs between two or more
objects.
Example : Market Basket Analysis shown below.
Classification Analysis :
Classification is the process of finding a model (or function) that describes and
distinguishes data classes or concepts.
Outlier Analysis :
A data set may contain objects that do not comply with the general behaviour or
model of the data. These objects are outliers. Many data mining methods discard outliers as
noise or exceptions.
In some applications (e.g fraud detection) the rare events can be more interesting than
the more regularly occurring ones. The analysis of outlier data is referred to as outlier analysis
or anomaly mining.
Data mining task can be specified in the form of a data mining query, which is input
to the data mining system. A data mining query is defined in terms of data mining tasks
primitives.
These primitives allow user to interactively communicate with the data mining
system during the discovery to direct the mining process or examine the findings from
different angles or depths.
The data mining primitives Specify ,
1. Set of task relevant data to be mined.
2. Kind of knowledge to be mined.
3. Background knowledge to be used in the discovery process.
4. Interestingness measures and thresholds for pattern evaluation.
5. Representation for visualizing the discovered patterns.
Mining Methodology
- Mining various and new kinds of knowledge
- Mining knowledge in multi-dimensional space
- Data Mining
- Boosting the power of discovery in a networked environment.
- Handling uncertainty, noise or incompleteness of data
- Pattern evaluation & pattern or constraint guided mining.
User Interaction
- Interactive Mining
- Incorporation of background knowledge
- Adhoc data mining & data mining query languages.
- Presentation & visualization of data mining results.
Efficiency & Scalability
- Efficiency & scalability of data mining algorithms.
- Parallel, distributed & incremental mining algorithms.
Diversity of Database Types
- Handling complex types of data.
- Mining dynamic, networked & global data repositories.
Data Mining & Society
- Social impacts of data mining
- Privacy preserving data mining
- Invisible data mining.
Thank You