Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
166 views42 pages

Classification - Issues Regarding Classification and Prediction

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 42

CSI3010 Data Warehousing and

Data Mining
Module-4:Classification

Dr Sunil Kumar P V

SCOPE
Classification: Overview
Introduction

• Bank loan
• Buys computer
• Best treatment
• The data analysis task is classification
• A model or classifier is constructed to predict
categorical labels, such as
• “safe” or “risky” for the loan application data
• “yes” or “no” for the marketing data
• “treatment A,” “treatment B,” or “treatment
C” for the medical data
2/39
Classification Vs Prediction

• Predict how much a given customer will spend


during a sale
• The model constructed predicts a
continuous-valued function, or ordered value, as
opposed to a categorical label
• Process is known as prediction/regression
• The model is known as predictor

3/39
Classification Step1

4/39
Classification Step2

5/39
Classification: Step1-Learning

• Learning step using the training set


• Training set: Database tuples and their
associated class labels
• A tuple X = (x1 , x2 , . . . , xn ) is an attribute
vector belongs to a predefined class label
(categorical, discrete and unordered)
• Learning y = f (X ), where y is the class label for
tuple X
• Because the class label of each training tuple is
provided, this step is also known as supervised
6/39
learning
Classification: Step2- Prediction

• A test set is used, made up of test tuples and


their associated class labels which are not used to
construct the classifier
• Accuracy of a classifier is the percentage of test
set tuples that are correctly classified by the
classifier

7/39
Points to note

• Supervised vs Unsupervised learning


• In prediction, no class label attribute, just the
predicted attribute
• Can be viewed as a mapping or function,
y = f (X ), where X is the input (e.g., a tuple
describing a loan applicant), and the output y is
a continuous or ordered value
• Eg :- Predicted loan amount

8/39
Issues Regarding Classification
and Prediction
Data Preparation

• Prepare data for classification and prediction to


improve the accuracy, efficiency, and scalability:
• Data cleaning to avoid noise and missing
values
• Relevance analysis to identify whether any
two given attributes are statistically related
(use correlation analysis)
• Attribute subset selection to remove
irrelevant/redundant attributes
• Data Transformation. Eg :- Normalization
9/39
Comparing Classification and Prediction Methods

• Accuracy
• Speed
• Robustness (performance given noisy data or
data with missing values)
• Scalability
• Iterpretability (level of understanding and insight
that is being provided by the classifier/predictor)

10/39
Decision Tree Induction
Decision Tree Algorithm Basics

• Three algorithms
• All were proposed in 1980s
• ID3 (Iterative Dichotomiser)
• C4.5 (a successor of ID3)
• Classification and Regression Trees (CART)
• Has three parameters: D, attribute_list, and
Attribute_selection_method
• Refer to Figure 6.3, Page 293, Han and Kamber
2nd Ed Textbook
11/39
Decision Tree Algorithms

• D is a data partition, Initiallly the entire training


data set
• attribute_list is a list of attributes describing the
tuples
• Attribute_selection_method specifies a
heuristic procedure for selecting the attribute
that “best” discriminates the given tuples
according to class
• information gain or the gini index (binary only)
• We use information gain as our trees many not
12/39
always be binary
Entropy(H)

• H(V ) = P(vk ) log2 P(v1 k ) =


P
k
P
− k P(vk ) log2 P(vk )
• Let X be a collection of training examples
• p+ is the collection of positive examples
• p− is the collection of negative examples
• H(X ) = −p+ log2 p+ − p− log2 p−

13/39
Entropy Examples

• H(X ) = −p+ log2 p+ − p− log2 p−


• Examples:
• H(14+, 0−) =
−14/14 log2 (14/14) − 0/14 log2 (0/14) = 0
• H(9+, 5−) =
−9/14 log2 (9/14) − 5/14 log2 (5/14) = 0.94
• H(7+, 7−) =
−7/14 log2 (7/14) − 7/14 log2 (7/14) = 1

14/39
Information Gain

• Estimates the expected reduction in the entropy


in the given data, caused by partitioning the
example based on an attribute
• The information gain Gain(X , A) of an attribute
A, relative to a collection of examples X is
defined as
X |X (v )|
Gain(X , A) = H(X ) − H(X (v ))
|X |
v ∈Values(A)

15/39
Decision Tree Demo

16/39
Entropy of Outlook

• Entropy of the dataset


• H(X ) = H(9+, 5−) =
−9/14 log2 (9/14) − 5/14 log2 (5/14) = 0.94
• To determine the entropy of Outlook
• Values of Outlook −→
Sunny , Overcast, Rainy

17/39
Entropy of Outlook

• H(Sunny ) = H(2+, 3−) =


−2/5 log2 (2/5) − 3/5 log2 (2/5) = 0.971
• H(Overcast) = H(4+, 0−) =
−4/4 log2 (4/4) − 0/4 log2 (0/4) = 0
• H(Rainy ) = H(3+, 2−) =
−3/5 log2 (3/5) − 2/5 log2 (2/5) = 0.971

18/39
Information Gain of Outlook

• Gain(X , Outlook) = H(X ) −


P |X (v )|
v ∈{Sunny ,Overcast,Rainy } |X | Entropy (X (v ))
• H(X ) − (5/14)H(Sunny ) −
(4/14)H(Overcast) − (5/14)H(Rainy )
• = 0.94 − (5/14) × 0.971 − (4/14) × 0 −
(5/14) × 0.971 = 0.2464

19/39
Information Gain of Temperature

• Entropy of the dataset


• H(X ) = H(9+, 5−) =
−9/14 log2 (9/14) − 5/14 log2 (5/14) = 0.94
• Entropy of Temperature
• Values of Temperature −→ Hot, Mild, Cool
• H(Hot) = H(2+, 2−) =
−2/4 log2 (2/4) − 2/4 log2 (2/4) = 1
• H(Mild) = H(4+, 2−) =
−4/6 log2 (4/6) − 2/6 log2 (2/6) = 0.9183
• H(Cool) = H(3+, 1−) =
−3/4 log2 (3/4) − 1/4 log2 (1/4) = 0.8113 20/39
Information Gain of Temperature

• Gain(X , Temperature) =
P |X (v )|
H(X ) − v ∈{Hot,Mild,Cool} |X | Entropy (X (v ))
• H(X ) − 4/14H(Hot) − 6/14H(Mild) −
4/14H(Cool)
• = 0.94−(4/14) × 1 − (6/14) × 0.9183 −
(4/14) × 0.8113 = 0.0289

21/39
Entropy of Humidity

• Entropy of the dataset


• H(X ) = H(9+, 5−) =
−9/14 log2 (9/14) − 5/14 log2 (5/14) = 0.94
• Entropy of Humidity
• Values of Humidity −→ High, Normal
• H(High) = H(3+, 4−) =
−3/7 log2 (3/7) − 4/7 log2 (4/7) = 0.9852
• H(Normal) = H(6+, 1−) =
−6/7 log2 (6/7) − 1/7 log2 (1/7) = 0.5916
22/39
Information Gain of Humidity

• Gain(X , Humidity ) =
P |X (v )|
H(X ) − v ∈{High,Normal} |X | Entropy (X (v ))
• H(X ) − 7/14H(High) − 7/14H(Normal)
• = 0.94−(7/14) × 0.9852 − (7/14) × 0.5916 =
0.1516

23/39
Entropy of Wind

• Entropy of the dataset


• H(X ) = H(9+, 5−) =
−9/14 log2 (9/14) − 5/14 log2 (5/14) = 0.94
• Entropy of Wind
• Values of Wind −→ Strong , Weak
• H(Strong ) = H(3+, 3−) =
−3/6 log2 (3/6) − 3/6 log2 (3/6) = 1
• H(Weak) = H(6+, 2−) =
−6/8 log2 (6/8) − 2/8 log2 (2/8) = 0.8113
24/39
Information Gain of Wind

• Gain(X , Wind) =
P |X (v )|
H(X ) − v ∈{Strong ,Weak} |X | Entropy (X (v ))
• H(X ) − 6/14H(Strong ) − 8/14H(Weak)
• = 0.94−(6/14) × 1 − (8/14) × 0.8113 = 0.0478

25/39
Attribute with the Maximum Information Gain

• Gain(X , Outlook) = 0.2464


• Gain(X , Temperature) = 0.0289
• Gain(X , Humidity ) = 0.1516
• Gain(X , Wind) = 0.0478
• Attribute to be chosen is Outlook

26/39
The tree so far

27/39
Dataset corresponding to Outlook = Sunny

28/39
X (Outlook = Sunny , Temperature)

• Entropy of the dataset (X (Sunny ))


• H(Sunny ) = H(3+, 2−) =
−3/5 log2 (3/5) − 2/5 log2 (2/5) = 0.97
• Values of Temperature −→ Hot, Mild, Cool
• H(Hot) = H(0+, 2−) = 0
• H(Mild) = H(1+, 1−) = 1
• H(Cool) = H(1+, 0−) = 0
• Gain(X (Sunny ), Temperature) = H(Sunny ) −
P |X (v )|
v ∈{Hot,Mild,Cool} |X | Entropy (X (v ))
• 0.97 − 2/5 × 0 − 2/5 × 1 − 1/5 × 0 = 0.570
29/39
X (Outlook = Sunny , Humidity )

• Entropy of the dataset (X (Sunny ))


• H(Sunny ) = H(3+, 2−) =
−3/5 log2 (3/5) − 2/5 log2 (2/5) = 0.97
• Values of Humidity −→ High, Normal
• H(High) = H(0+, 3−) = 0
• H(Normal) = H(2+, 0−) = 0
• Gain(X (Sunny ), Humidity ) =
P |X (v )|
H(Sunny )− v ∈{High,Normal} |X | Entropy (X (v ))
• 0.97 − 3/5 × 0 − 2/5 × 0 = 0.97

30/39
X (Outlook = Sunny , Wind)

• Entropy of the dataset (X (Sunny ))


• H(Sunny ) = H(3+, 2−) =
−3/5 log2 (3/5) − 2/5 log2 (2/5) = 0.97
• Values of Wind −→ Strong , Weak
• H(Strong ) = H(1+, 1−) = 1
• H(Weak) = H(1+, 2−) = 0.9183
• Gain(X (Sunny ), Wind) = H(Sunny ) −
P |X (v )|
v ∈{Strong ,Weak} |X | Entropy (X (v ))
• 0.97 − 2/5 × 1 − 3/5 × 0.918 = 0.0192

31/39
Attribute with the Maximum Information Gain

• Gain(X (Sunny ), Temperature) = 0.570


• Gain(X (Sunny ), Humidity ) = 0.97
• Gain(X (Sunny ), Wind) = 0.0192
• Attribute to be chosen is Humidity

32/39
The tree so far

33/39
Dataset corresponding to Outlook = Rain

34/39
X (Outlook = Rain, Temperature)

• Entropy of the dataset (X (Rain))


• H(Rain) = H(3+, 2−) =
−3/5 log2 (3/5) − 2/5 log2 (2/5) = 0.97
• Values of Temperature −→ Hot, Mild, Cool
• H(Hot) = H(0+, 0−) = 0
• H(Mild) = H(2+, 1−) = 0.9183
• H(Cool) = H(1+, 1−) = 1
• Gain(X (Sunny ), Temperature) = H(Sunny ) −
P |X (v )|
v ∈{Hot,Mild,Cool} |X | Entropy (X (v ))
• 0.97 − 0/5 × 0 − 3/5 × 0.9183 − 2/5 × 1 = 0.0192
35/39
X (Outlook = Rain, Humidity )

• Entropy of the dataset (X (Rain))


• H(Rain) = H(3+, 2−) =
−3/5 log2 (3/5) − 2/5 log2 (2/5) = 0.97
• Values of Humidity −→ High, Normal
• H(High) = H(1+, 1−) = 1
• H(Normal) = H(2+, 1−) = 0.9183
• Gain(X (Rain), Humidity ) =
P |X (v )|
H(Rain) − v ∈{High,Normal} |X | Entropy (X (v ))
• 0.97 − 2/5 × 1 − 3/5 × 0.9183 = 0.0192

36/39
X (Outlook = Rain, Wind)

• Entropy of the dataset (X (Rain))


• H(Rain) = H(3+, 2−) =
−3/5 log2 (3/5) − 2/5 log2 (2/5) = 0.97
• Values of Wind −→ Strong , Weak
• H(Strong ) = H(0+, 2−) = 0
• H(Weak) = H(3+, 0−) = 0
• Gain(X (Rain), Wind) =
P |X (v )|
H(Rain) − v ∈{Strong ,Weak} |X | Entropy (X (v ))
• 0.97 − 2/5 × 0 − 3/5 × 0 = 0.97

37/39
Attribute with the Maximum Information Gain

• Gain(X (Rain), Temperature) = 0.0192


• Gain(X (Rain), Humidity ) = 0.0192
• Gain(X (Rain), Wind) = 0.97
• Attribute to be chosen is Wind

38/39
The Final Decision Tree

39/39

You might also like