0% found this document useful (0 votes)

29 views

Decision Tree Introduction

The document discusses decision trees, which create a flowchart-like structure to classify or predict outcomes based on predictor variables. Decision trees work by recursively splitting data into purer groups based on variables and thresholds. Advantages include the ability to handle both continuous and categorical variables, missing data, and errors, while disadvantages include reduced accuracy with few examples and increased complexity with more variables.

Uploaded by

Parvathaneni Karishma

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Decision Tree Introduction

Uploaded by

Parvathaneni Karishma

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Data Analytics

Session 4-5:Decision Trees / Classification and Regression Trees

Arghya Ray
Decision Tree

• A decision tree is a popular classification method that results in a flow-chart like tree structure where each node denotes a test on
an attribute value and each branch represents an outcome of the test. The tree leaves represent the classes.
• Decision tree is a model that is both predictive and descriptive.

• Advantages:
• Decision tree approach is widely used since it is efficient and can deal with both continuous and categorical variables.
• The decision tree approach is able to deal with missing values in the training data and can tolerate some errors in data.
• The decision tree approach is perhaps the best if each attribute takes only a small number of possible values.

• Disadvantages:
• Decision trees are less appropriate for tasks where the task is to predict values of a continuous variable like share price or interest rate.
• Decision trees can lead to a large number of errors if the number of training examples per class is small.
• The complexity of a decision tree increases as the number of attributes increases.

• Measuring the quality of a decision tree is an interesting problem altogether. Classification accuracy determined using test data is
obviously a good measure but other measures like, average cost and worst case cost of classifying an object may be used.
Picture taken from Velocity Business Solutions. Link: https://www.vebuso.com/2020/01/decision-tree-intuition-from-concept-to-application/
2. Decision trees can also be used to visualize classification rules.

Classification and Regression Trees

Goal: Classify or predict an outcome based on a set of predictors.
The output is a set of rules
Example:
• Goal: classify a record as “will accept credit card offer” or “will not accept”
• Rule might be “IF (Income > 92.5) AND (Education < 1.5) AND (Family <= 2.5) THEN Class = 0 (non-acceptor)
• Recursive partitioning: Repeatedly split the records into two parts so as to achieve maximum homogeneity within the new
parts

Recursive partitioning steps:

• Pick one of the predictor variables, xi
• Pick a value of xi, say si, that divides the training data into two (not necessarily equal) portions
• Measure how “pure” or homogeneous each of the resulting portions are
• “Pure” = containing records of mostly one class
• Algorithm tries with different variables (x) and different values of xi, , i.e., si to maximize purity in a split
• After you get a “maximum purity” split, repeat the process for a second split, and so on
Forming a tree from the given example

RID Age Income Student Credit rating Class (buys

computer)

1 <=30 High No Fair No

2 <=30 High No Excellent No
3 31-40 High No Fair Yes
4 >40 Medium No Fair Yes
5 >40 Low Yes Fair Yes
6 >40 Low Yes Excellent No
7 31-40 Low Yes Excellent Yes
8 <=30 Medium No Fair No
9 <=30 Low Yes Fair Yes
10 >40 Medium Yes Excellent Yes
11 <=30 Medium Yes Excellent Yes
12 30-40 Medium No Excellent Yes
13 30-40 High Yes Fair Yes
14 >40 Medium No Excellent No
Age

<=30 >40
31-40

income student Credit Class income student Credit Class

rating rating
High No fair No Medium No fair Yes
High No excellent No Low Yes Fair Yes
Medium No Fair No Low Yes Excellent No
Low Yes Fair Yes
Medium Yes Fair Yes
Medium Yes Excellent yes
Medium No Excellent No

income student Credit Class

rating
High No Fair Yes
Low Yes Excellent Yes
Medium No Excellent Yes
High Yes Fair Yes
Measuring Impurity
• Gini Index (measure of impurity)

• Gini Index for rectangle A containing m cases

p = proportion of cases in rectangle A that belong to class k

• I(A) = 0 when all cases belong to same class (most pure)

• Entropy (measure of impurity)

p = proportion of cases (out of m) in rectangle A that belong to class k

Entropy ranges between 0 (most pure) and log2(m) (equal representation of classes)
Using the principle of ‘Information entropy’ build a ‘decision tree’ using the training data given below. Divide the ‘credit
rating’ attribute into ranges as follows: (0, 1.6], (1.6,1.7], (1.7,1.8], (1.8,1.9], (1.9,2.0], (2.0,5.0]
Sr. No. Profession Credit rating Class
1 Business 1.6 Buys only laptop
2 Service 2.0 Buys laptop with CD Writer
3 Business 1.9 Buys laptop with printer
4 Business 1.88 Buys laptop with printer
5 Business 1.70 Buys only laptop
6 Service 1.85 Buys laptop with printer
7 Business 1.60 Buys only laptop
8 Service 1.70 Buys only laptop
9 Service 2.20 Buys laptop with CD writer
10 Service 2.10 Buys laptop with CD writer
11 Business 1.80 Buys laptop with printer
12 Service 1.95 Buys laptop with printer
13 Business 1.90 Buys laptop with printer
14 Business 1.80 Buys laptop with printer
15 Business 1.75 Buys laptop with printer
Profession

Business Service

Sr. Credit Class Sr. Credit Class

No. rating No. rating

1 1.6 Buys only laptop 1 2.0 Buys laptop with CD Writer

2 1.9 Buys laptop with printer

2 1.85 Buys laptop with printer
3 1.88 Buys laptop with printer

4 1.70 Buys only laptop 3 1.70 Buys only laptop

5 1.60 Buys only laptop 4 2.20 Buys laptop with CD writer

6 1.80 Buys laptop with printer

5 2.10 Buys laptop with CD writer

7 1.90 Buys laptop with printer

6 1.95 Buys laptop with printer
8 1.80 Buys laptop with printer

9 1.75 Buys laptop with printer

Credit Rating

(0,1.6]
(1.6,1.7]

Sr. Profession Class Sr. Professio Class

No. No. n

1 Business Buys only 1 Business Buys only

Laptop Laptop

2 Business Buys only

Laptop 2 Service Buys only
Laptop
• Initially there are 3 classes: Buys only laptop, buys laptop with CD writer, buys laptop with printer

• Initial Overall Entropy (E0)= -= - [ + ) = 0.918

• Based on Profession : 9 Business, 6 Service

• Entropy (Profession) = (service) =

• Information Gain (Profession) = E0-E(Profession) = 0.918-0.716= 0.202

• Entropy (CR (2,5])=Entropy(CR (0, 1.6])= Entropy (CR (1.6,1.7]) = Entropy (CR (1.7,1.8]) = Entropy( CR (1.8,1.9]) = 0

• Entropy (CR (1.9,2])= = 0.630

• Entropy (Credit Rating) = ++++ + = 0.0841

• Information Gain (Credit Rating) = 0.918-0.084 = 0.834

Credit Rating

(2,5] (1.9,2] (1.7,1.9] (0,1.7]

Buys laptop with CD Buys Laptop with

Profession (Service) Buys only laptop
Writer Printer

P=0.5 P=0.5

Buys laptop with CD Buys laptop with

Writer printer
The content of the slides are prepared from different textbooks.

References:

• Introduction to Data Mining with Case Studies, By G.K. Gupta. Copyright 2014 by PHI Learning Private Limited.
Thank you..

Peugeot Boxer 2006-2018 Service Manual
92% (13)
Peugeot Boxer 2006-2018 Service Manual
3,237 pages
OCAIRS AssessmentForms
100% (4)
OCAIRS AssessmentForms
50 pages
GMAT Foundations of Math
From Everand
GMAT Foundations of Math
Manhattan Prep
4/5 (4)
2PX4 Table Staad Report Documents
No ratings yet
2PX4 Table Staad Report Documents
23 pages
FNSACC418 Learner Guide V1.1
No ratings yet
FNSACC418 Learner Guide V1.1
216 pages
Twinkl Omnivore-Carnivore-Or-Herbivore-Venn-Diagram-Sorting-Activity-Sheet - Ver - 3
No ratings yet
Twinkl Omnivore-Carnivore-Or-Herbivore-Venn-Diagram-Sorting-Activity-Sheet - Ver - 3
3 pages
Kill Team Army List - Eldar Corsairs
100% (2)
Kill Team Army List - Eldar Corsairs
8 pages
Data Mining - Classification - Lecture04
No ratings yet
Data Mining - Classification - Lecture04
21 pages
05 KDD03 Classification - Update 1
No ratings yet
05 KDD03 Classification - Update 1
70 pages
Chapter 6 Rules Classification
No ratings yet
Chapter 6 Rules Classification
16 pages
Decision Tree
No ratings yet
Decision Tree
28 pages
Decision Tree
No ratings yet
Decision Tree
25 pages
Outline: - Learning Agents - Inductive Learning - Decision Tree Learning
No ratings yet
Outline: - Learning Agents - Inductive Learning - Decision Tree Learning
30 pages
Unit 10 - Decision Trees
No ratings yet
Unit 10 - Decision Trees
21 pages
AIML Lec-11
No ratings yet
AIML Lec-11
18 pages
P9-10 ClassBasic
No ratings yet
P9-10 ClassBasic
82 pages
Group 22323
No ratings yet
Group 22323
8 pages
QB - Data Science
No ratings yet
QB - Data Science
4 pages
05 Classification Part1
No ratings yet
05 Classification Part1
35 pages
Chapter Six Machine Learning
No ratings yet
Chapter Six Machine Learning
39 pages
Basics of Machine Learning and Classifications: Dr. Helal Uddin Ahmed
No ratings yet
Basics of Machine Learning and Classifications: Dr. Helal Uddin Ahmed
18 pages
dm 3
No ratings yet
dm 3
37 pages
Digital Transformation Project Prioritisation Tool-V0.3
100% (2)
Digital Transformation Project Prioritisation Tool-V0.3
69 pages
Attribute Selection Measures
No ratings yet
Attribute Selection Measures
15 pages
Post-exam Analysis (Gen Sohail Amin)
No ratings yet
Post-exam Analysis (Gen Sohail Amin)
42 pages
Read The Tutorial and Solve The Exercise Given Below
No ratings yet
Read The Tutorial and Solve The Exercise Given Below
5 pages
EECM3724_Unit_1_Ch2_slides_2022
No ratings yet
EECM3724_Unit_1_Ch2_slides_2022
51 pages
Lab08
No ratings yet
Lab08
2 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
Item Analysis
No ratings yet
Item Analysis
7 pages
Experiment 3: Name: Reena Kale Te Comps Roll No:23
100% (1)
Experiment 3: Name: Reena Kale Te Comps Roll No:23
4 pages
09 - ML - Decision Tree
No ratings yet
09 - ML - Decision Tree
45 pages
ML_Lec_5
No ratings yet
ML_Lec_5
37 pages
Experiment 3: Name: Reena Kale Te Comps Roll No:23
No ratings yet
Experiment 3: Name: Reena Kale Te Comps Roll No:23
4 pages
5-Classification (2)
No ratings yet
5-Classification (2)
59 pages
Variables Scales of Measurement
No ratings yet
Variables Scales of Measurement
18 pages
Materi Naive Bayes
No ratings yet
Materi Naive Bayes
15 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
05_Decision Tree_updated
No ratings yet
05_Decision Tree_updated
69 pages
AIML Lec-10
No ratings yet
AIML Lec-10
19 pages
Number - Properties - 1 - June Egmat Webinar
No ratings yet
Number - Properties - 1 - June Egmat Webinar
80 pages
Selection de RRHH
No ratings yet
Selection de RRHH
41 pages
Classification
No ratings yet
Classification
95 pages
Item Analysis Made Easy
No ratings yet
Item Analysis Made Easy
55 pages
Machine Learning Lab Assignment CSE-716: S. M. Shafkat Raihan ID: 16701041 SESSION: 2015-16
No ratings yet
Machine Learning Lab Assignment CSE-716: S. M. Shafkat Raihan ID: 16701041 SESSION: 2015-16
9 pages
05E.90 Improving A Classrom-Based Assessment Test
100% (1)
05E.90 Improving A Classrom-Based Assessment Test
36 pages
BRM-Lecture 4-2023
No ratings yet
BRM-Lecture 4-2023
48 pages
Lec 12
No ratings yet
Lec 12
24 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
Ecture Ecision REE: Sajal Halder Bsmrstu
100% (1)
Ecture Ecision REE: Sajal Halder Bsmrstu
22 pages
Module 2 Part 2 Types of Scales
No ratings yet
Module 2 Part 2 Types of Scales
19 pages
KRA Sales Team
No ratings yet
KRA Sales Team
7 pages
ML Unit-2.1
No ratings yet
ML Unit-2.1
17 pages
ItemAnalysis-1
No ratings yet
ItemAnalysis-1
18 pages
Decision Tree
100% (1)
Decision Tree
12 pages
Developing Forward-Looking Metrics and Reporting: Jeff Horon, Mike Yiu
No ratings yet
Developing Forward-Looking Metrics and Reporting: Jeff Horon, Mike Yiu
30 pages
02 Input Output
No ratings yet
02 Input Output
44 pages
PED06 Midterm Item Analysis
No ratings yet
PED06 Midterm Item Analysis
55 pages
QB - Data Science
No ratings yet
QB - Data Science
7 pages
CAT - II - ML - QP
No ratings yet
CAT - II - ML - QP
4 pages
Help - Outline: Home Classroom Ecommunities Eportfolios Matrices Rubrics Surveys
No ratings yet
Help - Outline: Home Classroom Ecommunities Eportfolios Matrices Rubrics Surveys
7 pages
Scaling Techniques
No ratings yet
Scaling Techniques
39 pages
2000 Cart
No ratings yet
2000 Cart
50 pages
New Franchise BPP Template
No ratings yet
New Franchise BPP Template
14 pages
Number Properties 1 July PDF
No ratings yet
Number Properties 1 July PDF
79 pages
Stress-Free SAT: A Step-by-Step Beginner's Guide to SAT Preparation
From Everand
Stress-Free SAT: A Step-by-Step Beginner's Guide to SAT Preparation
The Princeton Review
No ratings yet
#3 Structural Design
No ratings yet
#3 Structural Design
30 pages
Distribution
No ratings yet
Distribution
6 pages
#4 Types of Structures
No ratings yet
#4 Types of Structures
31 pages
Financial Reporting and Analysis - Course Outline - 2022
No ratings yet
Financial Reporting and Analysis - Course Outline - 2022
6 pages
22PGDM172 - Karishma Parvathaneni - C
No ratings yet
22PGDM172 - Karishma Parvathaneni - C
10 pages
Course Outline - Managerial Economics
No ratings yet
Course Outline - Managerial Economics
6 pages
Topics Entrance Tests Maths Physics
No ratings yet
Topics Entrance Tests Maths Physics
1 page
Matrix of Communication Theories
No ratings yet
Matrix of Communication Theories
8 pages
MedJDYPatilUniv9191-1385851 002305
No ratings yet
MedJDYPatilUniv9191-1385851 002305
4 pages
ABC Chemistry Rates Modern Publishers
18% (11)
ABC Chemistry Rates Modern Publishers
5 pages
233185P VR28-013 Voltage Regulator
No ratings yet
233185P VR28-013 Voltage Regulator
16 pages
Advanced Administration Guide: General Parallel File System
No ratings yet
Advanced Administration Guide: General Parallel File System
164 pages
J Institute Brewing - January February 1993 - Langstaff - The MOUTHFEEL of BEER A REVIEW
No ratings yet
J Institute Brewing - January February 1993 - Langstaff - The MOUTHFEEL of BEER A REVIEW
7 pages
Ryse Energy Data Sheet E 10
No ratings yet
Ryse Energy Data Sheet E 10
2 pages
Summary of Ancient Mystery Cults
100% (3)
Summary of Ancient Mystery Cults
25 pages
Introduction Basic Welding Technology
100% (1)
Introduction Basic Welding Technology
17 pages
Me 2 U
No ratings yet
Me 2 U
7 pages
8779303-STANDARD
No ratings yet
8779303-STANDARD
4 pages
Schein (2010:6), Organizational Culture Can Be Defined As "A Pattern of Shared Basic
No ratings yet
Schein (2010:6), Organizational Culture Can Be Defined As "A Pattern of Shared Basic
5 pages
Campo in Japan
No ratings yet
Campo in Japan
3 pages
MS Undercroft Hollowcore Plank Placing Rev A WF
No ratings yet
MS Undercroft Hollowcore Plank Placing Rev A WF
12 pages
Osai Controller Manual
100% (1)
Osai Controller Manual
98 pages
CLIMDEX: Climate Extremes Indices
No ratings yet
CLIMDEX: Climate Extremes Indices
5 pages
Fundamentals of Criminal Investigation
No ratings yet
Fundamentals of Criminal Investigation
6 pages
Multivariate Statistical Approaches in Archeology: A Systematic Review
No ratings yet
Multivariate Statistical Approaches in Archeology: A Systematic Review
7 pages
Abdullah Infrared Radiation
No ratings yet
Abdullah Infrared Radiation
24 pages
University Organisational Chart
No ratings yet
University Organisational Chart
2 pages
Jungheinrich I Efg 213 320 Spec English
No ratings yet
Jungheinrich I Efg 213 320 Spec English
8 pages
Cardiology
No ratings yet
Cardiology
613 pages
Memoir Alejandrochavez
No ratings yet
Memoir Alejandrochavez
7 pages