Lecture 2 Data Mining Functions
Lecture 2 Data Mining Functions
3
Knowledge Discovery (KDD) Process
This is a view from typical database
systems and data warehousing
Pattern Evaluation
communities
Data mining plays an essential role in
the knowledge discovery process
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
4 Databases
Cubic view of Data
Aggregation Hierarchies
6 © Prentice Hall
Data Warehousing
“Subject-oriented, integrated, time-variant, nonvolatile”
William Inmon
Operational Data: Data used in day to day needs of
company.
Informational Data: Supports other functions such as
planning and forecasting.
Data mining tools often access data warehouses rather than
operational data.
7 © Prentice Hall
Operational vs. Informational
Operational Data Data Warehouse
Application OLTP OLAP
Use Precise Queries Ad Hoc
Temporal Snapshot Historical
Modification Dynamic Static
Orientation Application Business
Data Operational Values Integrated
Size Gigabits Terabits
Level Detailed Summarized
Access Often Less Often
Response Few Seconds Minutes
Data Schema Relational Star/Snowflake
8 © Prentice Hall
OLAP
Online Analytic Processing (OLAP): provides more complex
queries than OLTP.
OnLine Transaction Processing (OLTP): traditional
database/transaction processing.
Dimensional data; cube view
Visualization of operations:
Slice: examine sub-cube.
Dice: rotate cube to look at another dimension.
Roll Up/Drill Down
9 © Prentice Hall
OLAP Operations
Roll Up
Drill Down
10 © Prentice Hall
Data Mining in Business Intelligence
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
12
Multi-Dimensional View of Data Mining
Data to be mined
Database data (extended-relational, object-oriented, heterogeneous,
legacy), data warehouse, transactional data, stream, spatiotemporal,
time-series, sequence, text and web, multi-media, graphs & social and
information networks
Knowledge to be mined (or: Data mining functions)
Characterization, discrimination, association, classification, clustering,
trend/deviation, outlier analysis, etc.
Descriptive (mining tasks characterize properties of the data in a
target data set.) vs. predictive data mining (mining tasks perform
induction on the current data in order to make predictions).
Multiple/integrated functions and mining at multiple levels
Techniques utilized
Data-intensive, data warehouse (OLAP), machine learning, statistics,
pattern recognition, visualization, high-performance, etc.
13
FUNCTION OF DATA MINING
14
Application of Data Mining
Spatial Data Analysis
Information Retrieval
Pattern Recognition
Image Analysis
Signal Processing
Computer Graphics
Web Technology
Business
Bioinformatics
Data Mining Function: (1) Generalization
1/1
CLASS ACTIVITY
Rule 1) Coke, burger Diapers
Rule 2) Coke, burger, Potatoes bread
Rule 3) Coke, burger, potatoes onion, bread
Rule 4) burger, potatoes, onion coke
27
Clustering
Data Mining Function: (5) Outlier Analysis
Outlier analysis
Outlier: A data object that does not comply with the
general behavior of the data
Noise or exception? ― One person’s garbage could be
another person’s treasure
Methods: by product of clustering or regression analysis,
…
Useful in fraud detection, rare events analysis
29
Data Mining Function: (6) Prediction
The major idea is to use a large number of past
values to consider probable future values.
Forecasting and predicting the unavailable data
values or a class label for some data.
Evaluation of Knowledge
Are all mined knowledge interesting?
One can mine tremendous amount of “patterns”
Some may fit only certain dimension space (time,
location, …)
Some may not be representative, may be transient, …
Evaluation of mined knowledge → directly mine only
interesting knowledge?
Descriptive vs. predictive
Coverage
Typicality vs. novelty
Accuracy
Timeliness
…
31
Data Mining: Confluence of Multiple Disciplines
32
Summary
33
Class Activity
Discuss whether or not each of the following activities is a
data mining task.