Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
409 views

Decision Tree Notes

Decision trees are a flow-like tree structure that uses conditional logic to perform predictive analysis. They have internal nodes that represent test conditions, branches that represent test outcomes, and leaf nodes that represent class labels. Decision trees can be used for both classification and regression problems, and are often called CART models. They work by splitting nodes into sub-nodes based on evaluating features' entropy and information gain to determine the optimal attribute to use for splitting at each step.

Uploaded by

Pallavi Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
409 views

Decision Tree Notes

Decision trees are a flow-like tree structure that uses conditional logic to perform predictive analysis. They have internal nodes that represent test conditions, branches that represent test outcomes, and leaf nodes that represent class labels. Decision trees can be used for both classification and regression problems, and are often called CART models. They work by splitting nodes into sub-nodes based on evaluating features' entropy and information gain to determine the optimal attribute to use for splitting at each step.

Uploaded by

Pallavi Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Decision Tree Notes:

● Decision tree as the name suggests it is a flow like a tree structure that 

works on the principle of conditions. 

● It is efficient and has strong algorithms used for​ predictive analysis​. 

● It has mainly attributed that include​ ​internal nodes, branches and a 

terminal node. 

● Every​ internal node​ holds a ​“test” ​on an attribute, b


​ ranches​ hold the 

conclusion of the test​ and every ​leaf node​ means the ​class label​. 

● It is used for both​ classifications a


​ s well as​ r​ egression​. It is often 

termed as​ “​CART”​ that means classification and regression tree.  

● Tree algorithms are always preferred due to stability and reliability. 

● A decision tree makes decisions by splitting nodes into sub-nodes. 

Example: various examples 

 
 

 
 

Last Part: 

● Node splitting, or simply splitting, is the process of dividing a node into multiple 

sub-nodes to create relatively pure nodes. 

● There are multiple ways of doing this, which can be broadly divided into two 

categories based on the type of target variable 

Categorical Target Variable 

● Entropy(Gini Impurity) 

● Information Gain 

● Chi-Square 

The algorithm can be summarized as :

1. At each stage (node), pick out the best feature as the test condition.
2. Now split the node into the possible outcomes (internal nodes).

3. Repeat the above steps till all the test conditions have been exhausted into leaf
nodes.

When you start to implement the algorithm, the first question is: ​‘How to pick the
starting test condition?’

Choose best features or attributes for split

The answer to this question lies in the values of ​‘Entropy​’ and ​‘Information Gain​’.
Let us see what are they and how do they impact our decision tree creation.

Entropy: ​Entropy in Decision Tree stands for ​homogeneity. ​If the data is
completely homogenous, the entropy is 0, else if the data is divided (50-50%)
entropy is 1.

Information Gain: Information Gain is the ​decrease/increase in Entropy value


when the node is split.

● An attribute should have the highest information gain to be selected for


splitting.
● Based on the computed values of Entropy and Information Gain, we
choose the best attribute at any particular step.

 
 

 
 

You might also like