Vinee

P M
Overview
Introduction
Explanation of Data Mining Techniques Advantages
Applications
Privacy
EXTRACTING DATAMINING: THE KNOWLWDGE FROM THE DATA B
KDD: KDD stands forKNOWLEDGE DISCOVER DATA BASE KDD identify the invisible correlation
Data collection(1960):using
coumputers,tapes and disks

Data access(1980):using RDBMS,SQL
Data warehouse&decision
support(1990):using OLAP,DWH, Data mining(today):using advanced algorithms,micro processors etc
Data Warehousing
Data Warehouse:
is a repository (or archive) of information
gathered from multiple sources, stored under a unified schema, at a single site.
Collect data Store in single repository Allows for easier query development as a single repository can be
queried.
DATA WAREHOUSING:

Data warehousing has some OLAP operations OLAP stands for OnlineAnalyticalProcess It stores the historical data It performs only read pattern It deals with the long term operations OLAP operations are rollup
drill down slicing dicing

It has high flexibility and it consists less no.of users It is subject-oriented
OLAP - On-line Analytical Processing Provides you with a very good view of what is happening, but can not predict what will happen in the future or why it is happening
1.Data Cleaning 2.Data Integration 3.Data Transformation 4.Data Reduction
Discovery of Knowledge
Steps:
Business Understanding: what problem are we trying to
solve? What is the business trying to achieve? Data Understanding: do we have the data to be able to answer this questions? If not, what is the cost of acquiring that additional information? Data Preparation: all data is dirty and needs to be cleaned and transformed. This is the heavy lifting stage. Analysis & Modeling: the tools must be chosen based on what the business is trying to understand and the data available. Evaluate Outcomes: how well does the model actually works from a statistical point of view (significance) and from a business point of view (actionability)? Deployment: driving the insight into the business.
Data Mining Techniques

Classification
Clustering Regression
Association Rules
Classification: Given a set of items that have several classes,
Classification (training instances) with their and given the past instances
associated class, Classification is the process of predicting the class of a new item. Therefore to classify the new item and identify to which class it belongs Example: A bank wants to classify its Home Loan Customers into groups according to their response to bank advertisements. The bank might use the classifications Responds Rarely, Responds Sometimes, Responds Frequently. The bank will then attempt to find rules about the customers that respond Frequently and Sometimes. The rules could be used to predict needs of potential customers.
Technique for Classification

Decision-Tree Classifiers
Job
Engineer
Doctor
Carpenter
Income
<30K >50K <40K
Income
>90K
Income
<50K >100K
Bad
Good
Bad
Good
Bad
Good
Predicting credit risk of a person with the jobs specified.
Clustering
Clustering algorithms find groups of items that are similar. It divides a data set so that records with similar content are in the same group, and groups are as different as possible from each other. (2)
Example: Insurance company could use clustering to group clients by their age, location and types of insurance purchased. The categories are unspecified and this is referred to as unsupervised learning
Regression
Regression deals with the prediction of a value, rather than a class.
Example: Find out if there is a relationship
between smoking patients and cancer related illness. It removes the noisy data Given values: X1, X2... Xn Objective predict variable Y One way is to predict coefficients a0, a1, a2
Y = a0 + a1X1 + a2X2 + anXn
Regression
Example graph:
Line of Best Fit Curve Fitting
Association Rules
An association algorithm creates rules that describe
how often events have occurred together. (2)

Example: When a customer buys a hammer, then 90%
of the time they will buy nails. Ex: computer=>antivirussoftware[support=2%,confidence=60%]. Support ex: (A=>B)=P(AUB) Confidence ex:(A=>B)=P(B/A)
Association Rules
Support: is a measure of what fraction of the
population satisfies both the antecedent and the consequent of the rule Support is the measure in rule of interestingness. Support is 2% means that 2% of the transaction under analysis show that computer and antivirus are purchased together
Association Rules
Confidence: is a measure of how often the
consequent is true when the antecedent is true. It is the measure ot rule of interestingness Example: Confidence is 60% that means 60% of the customer who purchased the computer also purchase the software
ADVANTAGES:
Provides new knowledge from existing data
Public databases Government sources Company Databases
Old data can be used to develop new knowledge
Weatherforecast Insurance Government Health care New knowledge can be used to improve services or products

Improvements lead to:

Bigger profits More efficient service
Uses of Data Mining

Sales/ Marketing
Diversify target market
Identify clients needs to increase response rates
Risk Assessment
Identify Customers that pose high credit risk
Fraud Detection
Identify people misusing the system. E.g. People who
have two Social Security Numbers
Customer Care
Identify customers likely to change providers Identify customer needs
Financial data analysis
industry manufacturing
Telecommunicationb industry
Biological data analysis Scientific application
Retail
Intrusion detection
What data mining has done for...

Scheduled its workforce to provide faster, more accurate answers to questions.
Reduced direct mail costs by 30% while garnering 95% of the campaigns revenue.
Applications of Data Mining

(4)
Source IDC 1998
Privacy Concerns

Effective Data Mining requires large sources of data To achieve a wide spectrum of data, link multiple data sources Linking sources leads can be problematic for privacy as follows: If the following histories of a customer were linked:
Shopping History Credit History Bank History Employment History
The users life story can be painted from the
collected data
References
Silberschatz, Korth, Sudarshan, Database System Concepts, 5th Edition, Mc Graw Hill, 2005 2. http://www.twocrows.com/glossary.htm, Two Crows, Data Mining Glossary 3. http://en.wikipedia.org/wiki/Data_mining, Wikipedia 4. http://phoenix.phys.clemson.edu/tutorials/exce l/regression.html 5. http://wwwmaths.anu.edu.au/~steve/pdcn.pdf
1.

Vinee

Uploaded by

Copyright:

Available Formats

Vinee

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vinee

Uploaded by

Copyright:

Available Formats

P M

EXTRACTING DATAMINING: THE KNOWLWDGE FROM THE DATA B

coumputers,tapes and disks

support(1990):using OLAP,DWH, Data mining(today):using advanced algorithms,micro processors etc

drill down slicing dicing

1.Data Cleaning 2.Data Integration 3.Data Transformation 4.Data Reduction

Data Mining Techniques

Classification: Given a set of items that have several classes,

Technique for Classification

Predicting credit risk of a person with the jobs specified.

how often events have occurred together. (2)

Old data can be used to develop new knowledge

Improvements lead to:

Uses of Data Mining

have two Social Security Numbers

Financial data analysis

What data mining has done for...

Applications of Data Mining

Source IDC 1998

The users life story can be painted from the

You might also like