Classification Algorithms Used in Data Mining. This Is A Lecture Given To MSC Students.
Classification Algorithms Used in Data Mining. This Is A Lecture Given To MSC Students.
IN DATA MINING
SUSHIL KULKARNI
SUSHIL KULKARNI
Classification
What is classification?
Model Construction
ID3
Information Theory
Naïve Baysian Classifier
SUSHIL KULKARNI
CLASSIFICATION
PROBLEM
SUSHIL KULKARNI
CLASSIFICATION PROBLEM
Given a database D={t1,t2,…,tn} and a set
of classes C={C1,…,Cm}, the Classification
Problem is to define a mapping f: D C
where each ti is assigned to one class.
SUSHIL KULKARNI
CLASSIFICATION EXAMPLES
SUSHIL KULKARNI
Why Classification? A motivating
application
Credit approval
o A bank wants to classify its customers
based on whether they are expected to
pay back their approved loans
The history of past customers is used to
train the classifier
The classifier provides rules, which
identify potentially reliable future
customers SUSHIL KULKARNI
Why Classification? A motivating
application
Credit approval
o Classification rule:
If age = “31...40” and income = high then
credit_rating = excellent
o Future customers
Suhas : age = 35, income = high ⇒ excellent
credit rating
Heena : age = 20, income = medium ⇒ fair
credit rating
SUSHIL KULKARNI
Classification — A Two-Step
Process
Model construction: describing a set of
predetermined classes: Excellent and Fair
using training set.
SUSHIL KULKARNI
Supervised Learning
SUSHIL KULKARNI
Classification Process (1):
Model Construction
Classification
Algorithms
Training
Data
Classifier
Testing
Data Unseen Data
(Dina, Professor, 4)
NAME RANK YEARS TENURED Teach?
Swati Assistant Prof 2 no
Malika Associate Prof 7 no
Tina Professor 5 yes
June Assistant Prof 7 yes SUSHIL KULKARNI
Model Construction: Example
Sr. Gender Age BP Drug
1 M 20 Normal A
2 F 73 Normal B
3 M 37 High A
4 M 33 Low B
5 F 48 High A
6 M 29 Normal A
7 F 52 Normal B
8 M 42 Low B
9 M 61 Normal B
10 F 30 Normal A
11 F 26 Low B
12 M 54 High A
SUSHIL KULKARNI
Model Construction: Example
Directed Tree
Blood Pressure ?
Drug A Drug B
SUSHIL KULKARNI
Model Construction: Example
SUSHIL KULKARNI
Model Construction: Example
The tree is constructed with training data and
there is no training error.
SUSHIL KULKARNI
Error and Support
Let t = No. of data points, r = no. of data points in a
class or node, max = maximum no. of data points in
a class or node, min = minimum no. of data points in
a class or node
o Accuracy = max / r
o Error = min / r
o Support = max / t
SUSHIL KULKARNI
Rules with different
accuracy & support
180 data points
E = 5/120 E = 2/60
A= 115/120 A= 58/60
S= 115/180 S= 58/180
X < 60 X > 60
115 A 58 A
5B 2B
Node P Node Q
SUSHIL KULKARNI
Criteria to grow the tree
SUSHIL KULKARNI
CLASSIFICATION
TREES FOR
CATEGORICAL
ATTRIBUTES
SUSHIL KULKARNI
INDUCTION DECISION TREE [ ID3]
Decision tree generation consists of two phases
o Tree construction
At start, all the training examples are at the root
Partition examples recursively based on
selected attributes
o Tree pruning
• Identify and remove branches that reflect noise
or outliers
age?
<=30
31..40 >40
no excellent fair
yes
no yes no yes
SUSHIL KULKARNI
ANOTHER EXAMPLE:
MARKS
x
<90 >=90
ӂ If x >= 90 then grade =A.
ӂIf 80<=x<90 then grade =B. x A
<50 >=60
F D
SUSHIL KULKARNI
ALGORITHM FOR ID 3
Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive divide-
and-conquer manner
At start, all the training examples are at the root
Attributes are categorical
Samples are partitioned recursively based on
selected attributes
Test attributes are selected on the basis of a
heuristic or statistical measure (e.g., information
gain)
SUSHIL KULKARNI
ALGORITHM FOR ID 3
SUSHIL KULKARNI
ID 3 : ADVANTAGES
Easy to understand.
SUSHIL KULKARNI
ID 3 :
DISADVANTAGES
SUSHIL KULKARNI
INFORMATION
THEORY
SUSHIL KULKARNI
INFORMATION THEORY
SUSHIL KULKARNI
INFORMATION THEORY
When all the marbles in the bowl are
mixed up, little information is given.
SUSHIL KULKARNI
ENTROPY
SUSHIL KULKARNI
BUILDING THE
TREE
SUSHIL KULKARNI
Information Gain ID3
Select the attribute with the highest information
gain
Assume there are two classes, P and N
Let the set S contain p elements of class P and n
elements of class N
The amount of information, needed to decide if an
arbitrary object in S belongs to P or N is defined as
p p n n
I ( p, n) = − log 2 − log 2
p+n p+n p+n p+n
SUSHIL KULKARNI
Information Gain in Decision
Tree Induction
Assume that using attribute A, a set S will be
partitioned into sets {S1, S2 , …, Sv}
If Si contains pi elements of P and ni elements of N,
the entropy, or the expected information needed to
classify objects in all sub trees Si is
ν pi + ni
E ( A) = ∑ I ( pi , ni )
i =1 p + n
SUSHIL KULKARNI
Output: ID 3 for “buys_computer”
age?
<=30
31..40 >40
no excellent fair
yes
no yes no yes
SUSHIL KULKARNI
CART
SUSHIL KULKARNI
CART [ CLASSIFICATION AND
REGRESSION TREE]
Algorithm is similar to ID 3 but used GINI
index called impurity measure to select
variables.
If target variable is normal and has more
than two categories , the option of merging of
target categories into two super categories
may be considered. The process is called
Twoing.
SUSHIL KULKARNI
Gini Index (IBM Intelligent Miner)
If a data set T contains examples from n
classes, gini index, gini(T) is defined as
n
gini(T ) =1−∑p 2 j
j= 1
where pj is the relative frequency of class j in T.
SUSHIL KULKARNI
Extracting Classification Rules
from Trees
Represent the knowledge in the form of IF-
THEN rules
SUSHIL KULKARNI
Extracting Classification Rules
from Trees
The leaf node holds the class prediction
Rules are easy for humans to understand
Example
IF age = “<=30” AND student = “no” THEN
buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN
buys_computer = “yes”
IF age = “31…40” THEN
buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent”
THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “fair” THEN
buys_computer = “no” SUSHIL KULKARNI
BAYESIAN
CLASSIFICATION
SUSHIL KULKARNI
Classification and
regression
What is classification? What is regression?
Issues regarding classification and
regression
Classification by decision tree induction
Bayesian Classification
Other Classification Methods
regression
SUSHIL KULKARNI
What is Bayesian
Classification?
Bayesian classifiers are statistical
classifiers
SUSHIL KULKARNI
What is Bayesian
Classification?
Example:
SUSHIL KULKARNI
Naive Bayesian Classifier
play tennis?
Example
Outlook Temperature Humidity W indy Class
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
SUSHIL KULKARNI
Naive Bayesian Classifier
Example
Outlook Temperature Humidity W indy Class
overcast hot high false P
rain mild high false P
rain cool normal false P
overcast cool normal true P 9
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
Outlook Temperature Humidity Windy Class
sunny hot high false N
sunny hot high true N
rain cool normal true N 5
sunny mild high false N
rain mild high true N
SUSHIL KULKARNI
Naive Bayesian Classifier
Example
Given the training set, we compute the
probabilities:
Outlook P N Humidity P N
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature W indy
hot 2/9 2/5 true 3/9 3/5
mild 4/9 2/5 false 6/9 2/5
cool 3/9 1/5
SUSHIL KULKARNI
Naive Bayesian Classifier
Example
To classify a new day E:
outlook = sunny, temperature = cool
humidity = high, windy = false
* P(false|p) * P(p)
= 3/9·2/9·3/9·6/9·9/14 = 0.010582
P(X|n) · P(n) = P(rain|n) · P(hot|n) ·
P(high|n)·P(false|n)·P(n)
= 2/5·2/5·4/5·2/5·5/14 = 0.018286
Sample X is classified in class N (don’t play)
SUSHIL KULKARNI
Naive Bayesian Classifier
Example
Probability of ‘Playing’
0.010582
= = 37 %
0.010582 + 0.0182860
Probability of ‘ Not Playing’
0.0182860
= = 63 %
0.010582 + 0.0182860
Therefore X takes class label N SUSHIL KULKARNI
REGRESSION
SUSHIL KULKARNI
What Is regression?
regression is similar to classification
o First, construct a model
o Second, use model to predict unknown value
Major method for regression is regression
• Linear and multiple regression
• Non-linear regression
regression is different from classification
o Classification refers to predict categorical
class label
o regression models continuous-valued
functions SUSHIL KULKARNI
Predictive Modeling in
Databases
Predictive modeling: Predict data values or
construct generalized linear models based
on the database data.
One can only predict value ranges or
category distributions
Determine the major factors which influence
the regression
o Data relevance analysis: uncertainty
measurement, entropy analysis, expert
judgement, etc. SUSHIL KULKARNI
Regress Analysis and Log-
Linear Models in Regression
Linear regression: Y = α + β X
∑
s
( xi − x )( yi − y )
β= i =1
a =y −βx
∑
s
i =1
( xi − x ) 2
SUSHIL KULKARNI
Regress Analysis and Log-
Linear Models in Regression
Multiple regression: Y = b0 + b1 X1 + b2 X2.
Many nonlinear functions can be transformed into
the above.
E.g.,Y=b 0 + b1 X+ b2X 2+ b3X 3, X1=X, X2=X 2, X3=X 3
Log-linear models:
The multi-way table of joint probabilities is
approximated by a product of lower-order tables.
Probability: p(a, b, c, d) = α ab β acχ ad δ bcd
SUSHIL KULKARNI
T H A N K S !
SUSHIL KULKARNI