Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
29 views

Decision Tree Introduction

The document discusses decision trees, which create a flowchart-like structure to classify or predict outcomes based on predictor variables. Decision trees work by recursively splitting data into purer groups based on variables and thresholds. Advantages include the ability to handle both continuous and categorical variables, missing data, and errors, while disadvantages include reduced accuracy with few examples and increased complexity with more variables.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Decision Tree Introduction

The document discusses decision trees, which create a flowchart-like structure to classify or predict outcomes based on predictor variables. Decision trees work by recursively splitting data into purer groups based on variables and thresholds. Advantages include the ability to handle both continuous and categorical variables, missing data, and errors, while disadvantages include reduced accuracy with few examples and increased complexity with more variables.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Data Analytics

Session 4-5:Decision Trees / Classification and Regression Trees

Arghya Ray
Decision Tree

• A decision tree is a popular classification method that results in a flow-chart like tree structure where each node denotes a test on
an attribute value and each branch represents an outcome of the test. The tree leaves represent the classes.
• Decision tree is a model that is both predictive and descriptive.

• Advantages:
• Decision tree approach is widely used since it is efficient and can deal with both continuous and categorical variables.
• The decision tree approach is able to deal with missing values in the training data and can tolerate some errors in data.
• The decision tree approach is perhaps the best if each attribute takes only a small number of possible values.

• Disadvantages:
• Decision trees are less appropriate for tasks where the task is to predict values of a continuous variable like share price or interest rate.
• Decision trees can lead to a large number of errors if the number of training examples per class is small.
• The complexity of a decision tree increases as the number of attributes increases.

• Measuring the quality of a decision tree is an interesting problem altogether. Classification accuracy determined using test data is
obviously a good measure but other measures like, average cost and worst case cost of classifying an object may be used.
Picture taken from Velocity Business Solutions. Link: https://www.vebuso.com/2020/01/decision-tree-intuition-from-concept-to-application/
2. Decision trees can also be used to visualize classification rules.

Classification and Regression Trees


Goal: Classify or predict an outcome based on a set of predictors.
The output is a set of rules
Example:
• Goal: classify a record as “will accept credit card offer” or “will not accept”
• Rule might be “IF (Income > 92.5) AND (Education < 1.5) AND (Family <= 2.5) THEN Class = 0 (non-acceptor)
• Recursive partitioning: Repeatedly split the records into two parts so as to achieve maximum homogeneity within the new
parts

Recursive partitioning steps:


• Pick one of the predictor variables, xi
• Pick a value of xi, say si, that divides the training data into two (not necessarily equal) portions
• Measure how “pure” or homogeneous each of the resulting portions are
• “Pure” = containing records of mostly one class
• Algorithm tries with different variables (x) and different values of xi, , i.e., si to maximize purity in a split
• After you get a “maximum purity” split, repeat the process for a second split, and so on
Forming a tree from the given example

RID Age Income Student Credit rating Class (buys


computer)

1 <=30 High No Fair No


2 <=30 High No Excellent No
3 31-40 High No Fair Yes
4 >40 Medium No Fair Yes
5 >40 Low Yes Fair Yes
6 >40 Low Yes Excellent No
7 31-40 Low Yes Excellent Yes
8 <=30 Medium No Fair No
9 <=30 Low Yes Fair Yes
10 >40 Medium Yes Excellent Yes
11 <=30 Medium Yes Excellent Yes
12 30-40 Medium No Excellent Yes
13 30-40 High Yes Fair Yes
14 >40 Medium No Excellent No
Age

<=30 >40
31-40

income student Credit Class income student Credit Class


rating rating
High No fair No Medium No fair Yes
High No excellent No Low Yes Fair Yes
Medium No Fair No Low Yes Excellent No
Low Yes Fair Yes
Medium Yes Fair Yes
Medium Yes Excellent yes
Medium No Excellent No

income student Credit Class


rating
High No Fair Yes
Low Yes Excellent Yes
Medium No Excellent Yes
High Yes Fair Yes
Measuring Impurity
• Gini Index (measure of impurity)

• Gini Index for rectangle A containing m cases

p = proportion of cases in rectangle A that belong to class k

• I(A) = 0 when all cases belong to same class (most pure)

• Entropy (measure of impurity)

p = proportion of cases (out of m) in rectangle A that belong to class k

Entropy ranges between 0 (most pure) and log2(m) (equal representation of classes)
Using the principle of ‘Information entropy’ build a ‘decision tree’ using the training data given below. Divide the ‘credit
rating’ attribute into ranges as follows: (0, 1.6], (1.6,1.7], (1.7,1.8], (1.8,1.9], (1.9,2.0], (2.0,5.0]
Sr. No. Profession Credit rating Class
1 Business 1.6 Buys only laptop
2 Service 2.0 Buys laptop with CD Writer
3 Business 1.9 Buys laptop with printer
4 Business 1.88 Buys laptop with printer
5 Business 1.70 Buys only laptop
6 Service 1.85 Buys laptop with printer
7 Business 1.60 Buys only laptop
8 Service 1.70 Buys only laptop
9 Service 2.20 Buys laptop with CD writer
10 Service 2.10 Buys laptop with CD writer
11 Business 1.80 Buys laptop with printer
12 Service 1.95 Buys laptop with printer
13 Business 1.90 Buys laptop with printer
14 Business 1.80 Buys laptop with printer
15 Business 1.75 Buys laptop with printer
Profession

Business Service

Sr. Credit Class Sr. Credit Class


No. rating No. rating

1 1.6 Buys only laptop 1 2.0 Buys laptop with CD Writer

2 1.9 Buys laptop with printer


2 1.85 Buys laptop with printer
3 1.88 Buys laptop with printer

4 1.70 Buys only laptop 3 1.70 Buys only laptop

5 1.60 Buys only laptop 4 2.20 Buys laptop with CD writer

6 1.80 Buys laptop with printer


5 2.10 Buys laptop with CD writer

7 1.90 Buys laptop with printer


6 1.95 Buys laptop with printer
8 1.80 Buys laptop with printer

9 1.75 Buys laptop with printer


Credit Rating

(0,1.6]
(1.6,1.7]

Sr. Profession Class Sr. Professio Class


No. No. n

1 Business Buys only 1 Business Buys only


Laptop Laptop

2 Business Buys only


Laptop 2 Service Buys only
Laptop
• Initially there are 3 classes: Buys only laptop, buys laptop with CD writer, buys laptop with printer

• Initial Overall Entropy (E0)= -= - [ + ) = 0.918

• Based on Profession : 9 Business, 6 Service

• Entropy (Profession) = (service) =

• Information Gain (Profession) = E0-E(Profession) = 0.918-0.716= 0.202

• Entropy (CR (2,5])=Entropy(CR (0, 1.6])= Entropy (CR (1.6,1.7]) = Entropy (CR (1.7,1.8]) = Entropy( CR (1.8,1.9]) = 0

• Entropy (CR (1.9,2])= = 0.630

• Entropy (Credit Rating) = ++++ + = 0.0841

• Information Gain (Credit Rating) = 0.918-0.084 = 0.834


Credit Rating

(2,5] (1.9,2] (1.7,1.9] (0,1.7]

Buys laptop with CD Buys Laptop with


Profession (Service) Buys only laptop
Writer Printer

P=0.5 P=0.5

Buys laptop with CD Buys laptop with


Writer printer
The content of the slides are prepared from different textbooks.

References:

• Data Mining and Predictive Analytics, By Daniel T. Larose. Copyright 2015 John Wiley & Sons, Inc.

• Predictive Analytics for Dummies, By Anasse Bari, Mohamed Chaouchi, & Tommy Jung, Copyright 2016, John Wiley & Sons, Inc.

• Introduction to Data Mining with Case Studies, By G.K. Gupta. Copyright 2014 by PHI Learning Private Limited.
Thank you..

You might also like