Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
59 views

Decision Tree Algorithm

The document discusses decision trees, an algorithm used for classification and regression problems. It defines key terms like root node, branches, splitting, decision nodes, and leaf nodes. It explains that the root node is selected based on an attribute selection measure, and the tree splits recursively until reaching leaf nodes. Two common attribute selection measures are discussed: Gini index, which measures probability of misclassification; and information gain, which selects attributes that reduce entropy the most between root and leaf nodes. An example bank loan dataset is provided to demonstrate decision trees.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Decision Tree Algorithm

The document discusses decision trees, an algorithm used for classification and regression problems. It defines key terms like root node, branches, splitting, decision nodes, and leaf nodes. It explains that the root node is selected based on an attribute selection measure, and the tree splits recursively until reaching leaf nodes. Two common attribute selection measures are discussed: Gini index, which measures probability of misclassification; and information gain, which selects attributes that reduce entropy the most between root and leaf nodes. An example bank loan dataset is provided to demonstrate decision trees.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Decision Tree

Algorithm
Piranti Cerdas 2021

https://towardsai.net/p/programming/decision-trees-explained-with-a-practical-
example-fe47872d3b53
Introduction
A decision tree is one of the supervised machine learning
algorithms. This algorithm can be used for regression
and classification problems — yet, is mostly used for
classification problems. A decision tree follows a set of
if-else conditions to visualize the data and classify it
according to the conditions.
Root Node: This attribute is used for dividing the data into
two or more sets. The feature attribute in this node is

Overview
selected based on Attribute Selection Techniques.

Branch or Sub-Tree: A part of the entire decision tree is


called a branch or sub-tree.

Splitting: Dividing a node into two or more sub-nodes


based on if-else conditions.

Decision Node: After splitting the sub-nodes into further


sub-nodes, then it is called the decision node.

Leaf or Terminal Node: This is the end of the decision tree


where it cannot be split into further sub-nodes.

Pruning: Removing a sub-node from the tree is called


pruning.
Working of Decision Tree
1. The root node feature is selected based on the results
from the Attribute Selection Measure(ASM).
2. The ASM is repeated until a leaf node, or a terminal
node cannot be split into sub-nodes.
What is Attribute Selective Measure(ASM)?
Attribute Subset Selection Measure is a technique used in the data mining process for data reduction.
The data reduction is necessary to make better analysis and prediction of the target variable.
The two main ASM techniques are

+Gini index
+Information Gain(ID3)
Gini index
The measure of the degree of probability of a
particular variable being wrongly classified when
it is randomly chosen is called the Gini index or
Gini impurity.

When you use the Gini index as the criterion for


the algorithm to select the feature for the root
node.,The feature with the least Gini index is
selected.

Pi= probability of an object being classified into a


particular class.
Information Gain(ID3)
Entropy is the main concept of this algorithm,
which helps determine a feature or attribute
that gives maximum information about a class
is called Information gain or ID3 algorithm. By
using this method, we can reduce the level of
entropy from the root node to the leaf node.

‘p’, denotes the probability of E(S), which


denotes the entropy. The feature or attribute
with the highest ID3 gain is used as the root for
the splitting.
Example
Dataset :
https://www.kaggle.com/madhansing/bank-
loan2?select=madfhantr.csv

You might also like