0% found this document useful (0 votes)

5 views

Machine Learning With Python - Machine Learning Algorithms - Decision Tree

Uploaded by

Kalighat Okira

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Machine Learning With Python - Machine Learning Algorithms - Decision Tree

Uploaded by

Kalighat Okira

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Machine Learning with Python

Machine Learning Algorithms - Decision Tree

Prof. Shibdas Dutta,

Associate Professor,
DCG DATA CORE SYSTEMS INDIA PVT LTD
Kolkata

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Machine Learning Algorithms – Classification Algo- Decision Tree

Introduction - Decision Tree

In general, Decision tree analysis is a predictive modelling tool that can be applied
across many areas. Decision trees can be constructed by an algorithmic approach that
can split the dataset in different ways based on different conditions.

Decisions tress are the most powerful algorithms that falls under the category of
supervised algorithms.

They can be used for both classification and regression tasks.

The two main entities of a tree are decision nodes, where the data is split and leaves,
where we got outcome.

The example of a binary tree for predicting whether a person is fit or unfit providing
various information like age, eating habits and exercise habits, is given below:

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Person

No?
Yes?

Eats lot Exercise

of fast regularly?

No?
Yes?
No? Yes?

Unfit Fit Fit Unfit

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

In the above decision tree, the question are decision nodes and final outcomes are leaves.
We have the following two types of decision trees:

· Classification decision trees: In this kind of decision trees, the decision variable is
categorical. The above decision tree is an example of classification decision tree.

Regression decision trees: In this kind of decision trees, the decision variable is continuous.

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Implementing Decision Tree Algorithm

Gini Index
It is the name of the cost function that is used to evaluate the
binary splits in the dataset and works with the categorial
target variable “Success” or “Failure”.
Higher the value of Gini index, higher the homogeneity. A
perfect Gini index value is 0 and worst is 0.5 (for 2 class
problem). Gini index for a split can be calculated with the help
of following steps:
· First, calculate Gini index for sub-nodes by using the formula p^2+q^2 , which is the sum
of the square of probability for success and failure.

· Next, calculate Gini index for split using weighted Company

Gini score ofData-Core
Confidential: eachSystems,
nodeInc.of| datacoresystems.com
that split.
Split Creation
A split is basically including an attribute in the dataset and a value. We can create
a split in dataset with the help of following three parts:
· Part1: Calculating Gini Score: We have just discussed this part in the previous section.

· Part2: Splitting a dataset: It may be defined as separating a dataset into two lists of rows
having index of an attribute and a split value of that attribute. After getting the two groups - right
and left, from the dataset, we can calculate the value of split by using Gini score calculated in first
part. Split value will decide in which group the attribute will reside.

Part3: Evaluating all splits: Next part after finding Gini score and splitting dataset is the evaluation
of all splits. For this purpose, first, we must check every value associated with each attribute as a
candidate split. Then we need to find the best possible split by evaluating the cost of the split. The
best split will be used as a node in the decision tree.

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Building a Tree

As we know that a tree has root node and terminal nodes. After creating the root node, we can
build the tree by following two parts:

Part1:Terminal node creation

While creating terminal nodes of decision tree, one important point is to decide
when to stop growing tree or creating further terminal nodes. It can be done by
using two criteria namely maximum tree depth and minimum node records as
follows:
· Maximum Tree Depth: As name suggests, this is the maximum number of the nodes in a tree
after root node. We must stop adding terminal nodes once a tree reached at maximum depth i.e.
once a tree got maximum number of terminal nodes.

· Minimum Node Records: It may be defined as the minimum number of training patterns that a
given node is responsible for. We must stop adding terminal nodes once tree reached at these
minimum node records or below this minimum.
Terminal node is used to make a final prediction.

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Part2:

Recursive Splitting
As we understood about when to create terminal nodes, now we can start
building our tree. Recursive splitting is a method to build the tree. In this
method, once a node is created, we can create the child nodes (nodes added to
an existing node) recursively on each group of data, generated by splitting the
dataset, by calling the same function again and again.

Prediction
After building a decision tree, we need to make a prediction about it. Basically,
prediction involves navigating the decision tree with the specifically provided
row of data.
We can make a prediction with the help of recursive function, as did above. The
same prediction routine is called again with the left or the child right nodes.

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Assumptions
The following are some of the assumptions we make while creating decision tree:
· While preparing decision trees, the training set is as root node.

· Decision tree classifier prefers the features values to be categorical. In case if you want to use
continuous values then they must be done discretized prior to model building.

· Based on the attribute’s values, the records are recursively distributed.

Statistical approach will be used to place attributes at any node position i.e.as root node or
internal node.

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Implementation in Python
Example
In the following example, we are going to implement Decision Tree classifier on
Pima Indian Diabetes:
First, start with importing necessary python packages:
import pandas as pd

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

Next, download the iris dataset from its weblink as follows:

col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age',

'label']
pima = pd.read_csv(r"C:\pima-indians-diabetes.csv", header=None, names=col_names)
pima.head()

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

pregnant glucose bp skin insulin bmi pedigree age label

0 6 148 72 35 0 33.6 0.627 50 1

1 1 85 66 29 0 26.6 0.351 31 0

2 8 183 64 0 0 23.3 0.672 32 1

3 1 89 66 23 94 28.1 0.167 21 0

4 0 137 40 35 168 43.1 2.288 33 1

Now, split the dataset into features and target variable as follows:

feature_cols = ['pregnant', 'insulin', 'bmi', 'age','glucose','bp','pedigree']

X = pima[feature_cols] # Features

y = pima.label # Target variable

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Next, we will divide the data into train and test split. The following code will split the dataset into 70% training data and
30% of testing data:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,

random_state=1)

Next, train the model with the help of DecisionTreeClassifier class of sklearn as follows:

clf = DecisionTreeClassifier()

clf = clf.fit(X_train,y_train)
At last we need to make prediction. It can be done with the help of following script:
y_pred = clf.predict(X_test)

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Next, we can get the accuracy score, confusion matrix and classification report as follows:

from sklearn.metrics import classification_report, confusion_matrix,

accuracy_score

result = confusion_matrix(y_test, y_pred)

print("Confusion Matrix:")
print(result)

result1 = classification_report(y_test, y_pred)

print("Classification Report:",)
print (result1)

result2 = accuracy_score(y_test,y_pred)

print("Accuracy:",result2)

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Output

Confusion Matrix:
[[116 30]
[ 46 39]]
Classification
Report:
precision recall f1-score support

0 0.72 0.79 0.75 146

1 0.57 0.46 0.51 85

microavg 0.67 0.67 0.67 231

macroavg 0.64 0.63 0.63 231

weightedavg 0.66 0.67 0.66 231

Accuracy:
0.670995670995671

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Visualizing Decision Tree

The above decision tree can be visualized with the help of following code:

from sklearn.tree import export_graphviz

#from sklearn.externals.six import StringIO

from io import StringIO

from IPython.display import Image

pip install pydotplus

pip install graphviz conda install graphviz ------>in command prompt

import pydotplus
dot_data = StringIO()
export_graphviz(clf, out_file=dot_data, filled=True, rounded=True,
special_characters=True,feature_names = feature_cols,class_names=['0','1'])

graph = pydotplus.graph_from_dot_data(dot_data.getvalue())

graph.write_png('Pima_diabetes_Tree.png')
Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
or alternate code
from sklearn import tree
from matplotLib import pyplot as plt
fig=plt.figure(figsize=(25,20))
g=tree.plot_tree(clf)
Company Confidential: Data-Core Systems, Inc. | datacoresystems.com
Thank You

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

ZF - Axle - MS-T 3045 Service Manual
38% (8)
ZF - Axle - MS-T 3045 Service Manual
102 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
Data Mining
No ratings yet
Data Mining
15 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
decision tree
No ratings yet
decision tree
13 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
TEAA_ Tree Ensembles-1
No ratings yet
TEAA_ Tree Ensembles-1
43 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
Unit 4
No ratings yet
Unit 4
33 pages
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
No ratings yet
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
18 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
7 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Tree
No ratings yet
Tree
7 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
DECSION TREE
No ratings yet
DECSION TREE
6 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
5.desion Tree
No ratings yet
5.desion Tree
18 pages
Prac 6
No ratings yet
Prac 6
6 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
Decision tree
No ratings yet
Decision tree
16 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
Tree Based Classifiers: Dinesh R
No ratings yet
Tree Based Classifiers: Dinesh R
54 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Adobe Scan 16 May 2023 (5)
No ratings yet
Adobe Scan 16 May 2023 (5)
12 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
96 pages
Session 9 10 Decision Tree
No ratings yet
Session 9 10 Decision Tree
41 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Business Analytics: Data Classification
No ratings yet
Business Analytics: Data Classification
36 pages
Lab 2
No ratings yet
Lab 2
3 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
OS
No ratings yet
OS
59 pages
Unit-1 Introduction to big data analytics
No ratings yet
Unit-1 Introduction to big data analytics
57 pages
Unit-3 Hadoop Environment
No ratings yet
Unit-3 Hadoop Environment
31 pages
Unit-4 Big Data Analytics Methods using R
No ratings yet
Unit-4 Big Data Analytics Methods using R
57 pages
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
25 pages
Spark Essentials
No ratings yet
Spark Essentials
15 pages
Machine Learning With Python - Machine Learning Algorithms - Linear Regression
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - Linear Regression
8 pages
Nusrath Imrana
No ratings yet
Nusrath Imrana
13 pages
Monocoque Thesis
100% (2)
Monocoque Thesis
5 pages
E Voucher Hotel en 1259901267
No ratings yet
E Voucher Hotel en 1259901267
1 page
Heavy Hex Nut, ASTM A194/A194M and ASME SA194/SA194M, Grade 2H, Zinc
No ratings yet
Heavy Hex Nut, ASTM A194/A194M and ASME SA194/SA194M, Grade 2H, Zinc
2 pages
Exposé Anglais
0% (1)
Exposé Anglais
6 pages
Vas 6356
100% (1)
Vas 6356
80 pages
Media Literacy Analysis in The Philippine Context: Understanding The Role of Media in Educational Paradigm and Societal Landscape
No ratings yet
Media Literacy Analysis in The Philippine Context: Understanding The Role of Media in Educational Paradigm and Societal Landscape
10 pages
4 Updated As On 29-5-21 III, V & VII Sem (Repeater) MCQ Examination TT-July 2021
No ratings yet
4 Updated As On 29-5-21 III, V & VII Sem (Repeater) MCQ Examination TT-July 2021
25 pages
FPN - Application Preview
No ratings yet
FPN - Application Preview
1 page
Detecting Shapes in Images Using The Hough Transform: From Matlab and Simulink To Real Time With Ti Dsps
No ratings yet
Detecting Shapes in Images Using The Hough Transform: From Matlab and Simulink To Real Time With Ti Dsps
30 pages
Branch Circuit and Feeder Calculations: Chajri Er
No ratings yet
Branch Circuit and Feeder Calculations: Chajri Er
8 pages
Techniques and Management of E-Waste Recycling (4 Credits)
No ratings yet
Techniques and Management of E-Waste Recycling (4 Credits)
1 page
Java Variables
No ratings yet
Java Variables
30 pages
newsletter202405_0601
No ratings yet
newsletter202405_0601
10 pages
NFR
No ratings yet
NFR
82 pages
LC-24LE150M: Dear SHARP Customer
No ratings yet
LC-24LE150M: Dear SHARP Customer
8 pages
Attendance System With Bluetoothand Android App
No ratings yet
Attendance System With Bluetoothand Android App
5 pages
Hikvision Ds-2cd2186g2-Isu-C
No ratings yet
Hikvision Ds-2cd2186g2-Isu-C
6 pages
Overlay For Waterproofing Membrane On Roof Deck Floor
No ratings yet
Overlay For Waterproofing Membrane On Roof Deck Floor
5 pages
Lighthouse Guide Introduction To Lighthouse Version 5.5 To 5.8.5
No ratings yet
Lighthouse Guide Introduction To Lighthouse Version 5.5 To 5.8.5
16 pages
Differential Equations Theory Technique and Practice with Boundary Value Problems Second Edition Steven G. Krantz - Quickly download the ebook in PDF format for unlimited reading
No ratings yet
Differential Equations Theory Technique and Practice with Boundary Value Problems Second Edition Steven G. Krantz - Quickly download the ebook in PDF format for unlimited reading
80 pages
GP Fenderco Brochure 1
No ratings yet
GP Fenderco Brochure 1
2 pages
CAR - 66 Record of Revisions
No ratings yet
CAR - 66 Record of Revisions
1 page
Perforated Sheet Selection.
No ratings yet
Perforated Sheet Selection.
6 pages
DFo 4 1 Project
No ratings yet
DFo 4 1 Project
3 pages
Practical No.1
No ratings yet
Practical No.1
4 pages
GenCut GL350 - Manual-Automatic Cut Off Saw - Operation Manual - V. 2023
No ratings yet
GenCut GL350 - Manual-Automatic Cut Off Saw - Operation Manual - V. 2023
10 pages
EasyWOODIng manual book
No ratings yet
EasyWOODIng manual book
455 pages
Prelim Exam 22-23 DSBDA
No ratings yet
Prelim Exam 22-23 DSBDA
2 pages

Machine Learning With Python - Machine Learning Algorithms - Decision Tree

Uploaded by

Machine Learning With Python - Machine Learning Algorithms - Decision Tree

Uploaded by

Machine Learning with Python

Machine Learning Algorithms - Decision Tree

Prof. Shibdas Dutta,

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Introduction - Decision Tree

They can be used for both classification and regression tasks.

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Eats lot Exercise

Unfit Fit Fit Unfit

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

· Next, calculate Gini index for split using weighted Company

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Part1:Terminal node creation

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

· Based on the attribute’s values, the records are recursively distributed.

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

Next, download the iris dataset from its weblink as follows:

col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age',

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

0 6 148 72 35 0 33.6 0.627 50 1

2 8 183 64 0 0 23.3 0.672 32 1

4 0 137 40 35 168 43.1 2.288 33 1

feature_cols = ['pregnant', 'insulin', 'bmi', 'age','glucose','bp','pedigree']

y = pima.label # Target variable

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

from sklearn.metrics import classification_report, confusion_matrix,

result = confusion_matrix(y_test, y_pred)

result1 = classification_report(y_test, y_pred)

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

0 0.72 0.79 0.75 146

1 0.57 0.46 0.51 85

microavg 0.67 0.67 0.67 231

macroavg 0.64 0.63 0.63 231

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

from sklearn.tree import export_graphviz

#from sklearn.externals.six import StringIO

from io import StringIO

from IPython.display import Image

pip install pydotplus

pip install graphviz conda install graphviz ------>in command prompt

Company Confidential: Data-Core Systems, Inc. | datacoresystems.com

You might also like