Lecture 7.2 - DTC Algorithm Implementation

Uploaded by

suryapratp369

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Lecture 7.2 - DTC Algorithm Implementation

Uploaded by

suryapratp369

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

DTC Algorithm Implementation

1. Data Pre-Processing Step:

Below is the code for the pre-processing step:

1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('user_data.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, [2,3]].values
11. y= data_set.iloc[:, 4].values
12.
13. # Splitting the dataset into training and test set.
14. from sklearn.model_selection import train_test_split
15. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
16.
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)

In the above code, we have pre-processed the data. Where we have loaded the dataset, which is
given as:
2. Fitting a Decision-Tree algorithm to the Training set
• Now we will fit the model to the training set. For this, we will import
the DecisionTreeClassifier class from sklearn.tree library.
• Below is the code for it:

1. #Fitting Decision Tree classifier to the training set

2. From sklearn.tree import DecisionTreeClassifier
3. classifier= DecisionTreeClassifier(criterion='entropy', random_state=0)
4. classifier.fit(x_train, y_train)

In the above code, we have created a classifier object, in which we have passed two main
parameters;
o "criterion='entropy': Criterion is used to measure the quality of split, which is calculated by
information gain given by entropy.
o random_state=0": For generating the random states.

Below is the output for this:

Out[8]:
DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=0, splitter='best')

3. Predicting the test result

Now we will predict the test set result. We will create a new prediction vector y_pred. Below is the
code for it:

1. #Predicting the test set result

2. y_pred= classifier.predict(x_test)

Output:

• In the below output image, the predicted output and real test output are given.
• We can clearly see that there are some values in the prediction vector, which are different
from the real vector values. These are prediction errors.
4. Test accuracy of the result (Creation of Confusion matrix)
• In the above output, we have seen that there were some incorrect predictions, so if we want
to know the number of correct and incorrect predictions, we need to use the confusion
matrix.
• Below is the code for it:

1. #Creating the Confusion matrix

2. from sklearn.metrics import confusion_matrix
3. cm= confusion_matrix(y_test, y_pred)

Output:

In the above output image, we can see the confusion matrix, which has 6+3= 9 incorrect
predictions and62+29=91 correct predictions.

Therefore, we can say that compared to other classification models, the Decision Tree
classifier made a good prediction.

5. Visualizing the training set result:

• To visualize the training set result we will plot a graph for the decision tree classifier.
• The classifier will predict yes or No for the users who have either Purchased or Not purchased
the SUV car as we did in Logistic Regression.
• Below is the code for it:
1. #Visulaizing the trianing set result
2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_train, y_train
4. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, ste
p =0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
7. alpha = 0.75, cmap = ListedColormap(('purple','green' )))
8. mtp.xlim(x1.min(), x1.max())
9. mtp.ylim(x2.min(), x2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
13. mtp.title('Decision Tree Algorithm (Training set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

Output:

• The above output is completely different from the rest classification models.
• It has both vertical and horizontal lines that are splitting the dataset according to the age
and estimated salary variable.
• As we can see, the tree is trying to capture each dataset, which is the case of overfitting.
6. Visualizing the test set result:
Visualization of test set result will be similar to the visualization of the training set except that the
training set will be replaced with the test set.

1. #Visulaizing the test set result

2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_test, y_test
4. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, ste
p =0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
7. alpha = 0.75, cmap = ListedColormap(('purple','green' )))
8. mtp.xlim(x1.min(), x1.max())
9. mtp.ylim(x2.min(), x2.max())
10. fori, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
13. mtp.title('Decision Tree Algorithm(Test set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

Output:
• As we can see in the above image that there are some green data points within the purple
region and vice versa.
• So, these are the incorrect predictions which we have discussed in the confusion matrix.

24 - Practical Poker Math (Pat Dittmar)
No ratings yet
24 - Practical Poker Math (Pat Dittmar)
252 pages
Python Implementation of Random Forest Algorithm
No ratings yet
Python Implementation of Random Forest Algorithm
10 pages
5b Python Implementation of Decision Tree
No ratings yet
5b Python Implementation of Decision Tree
7 pages
SVM Implementation
No ratings yet
SVM Implementation
8 pages
svmdoc
No ratings yet
svmdoc
7 pages
Experiment 7 Support Vector Machine (SVM)
No ratings yet
Experiment 7 Support Vector Machine (SVM)
2 pages
Machine Learning Random Forest Algorithm - Javatpoint
No ratings yet
Machine Learning Random Forest Algorithm - Javatpoint
14 pages
ML_Lab_01999676272
No ratings yet
ML_Lab_01999676272
12 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
9 pages
Machine
100% (1)
Machine
45 pages
10 Random - Forest - Algo
No ratings yet
10 Random - Forest - Algo
6 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
data preprocessing
No ratings yet
data preprocessing
9 pages
BOSTON - Colab
No ratings yet
BOSTON - Colab
3 pages
EXP 08 (ML) - Shri
No ratings yet
EXP 08 (ML) - Shri
2 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
Estiven - Hurtado.Santos - Regresión Con Varios Algoritmos
No ratings yet
Estiven - Hurtado.Santos - Regresión Con Varios Algoritmos
16 pages
Lab Manual DL (New)
No ratings yet
Lab Manual DL (New)
89 pages
Tous Les Algo de ML
No ratings yet
Tous Les Algo de ML
7 pages
P05 The Regression Pipeline - Training and Testing Ans
No ratings yet
P05 The Regression Pipeline - Training and Testing Ans
13 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
Kelompok 3 - Latihan 1 Setup Python Dan Aljabar Linier
No ratings yet
Kelompok 3 - Latihan 1 Setup Python Dan Aljabar Linier
12 pages
ML practical Manjot 6-10
No ratings yet
ML practical Manjot 6-10
10 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
36 pages
ML MANUAL WITH OUTPUTS (2)
No ratings yet
ML MANUAL WITH OUTPUTS (2)
30 pages
FDL 4 5 Print Merged
No ratings yet
FDL 4 5 Print Merged
11 pages
ML Report
No ratings yet
ML Report
14 pages
ML Lab Programs
No ratings yet
ML Lab Programs
23 pages
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
No ratings yet
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
3 pages
Decision_Tree_Regression.ipynb - Colab
No ratings yet
Decision_Tree_Regression.ipynb - Colab
3 pages
ML practical Kiranjot 6-10
No ratings yet
ML practical Kiranjot 6-10
10 pages
Z-Test ND F-Test - Colab
No ratings yet
Z-Test ND F-Test - Colab
3 pages
DL
No ratings yet
DL
12 pages
Exp 6
No ratings yet
Exp 6
6 pages
ML practical Lovepreet 6-10
No ratings yet
ML practical Lovepreet 6-10
10 pages
Aiml Record From 1 To 10
No ratings yet
Aiml Record From 1 To 10
10 pages
ML 4,5,6 (Sample1)
No ratings yet
ML 4,5,6 (Sample1)
6 pages
CS6301 Homework2 KR
No ratings yet
CS6301 Homework2 KR
13 pages
Practical 6
No ratings yet
Practical 6
8 pages
Meaningful Predictive Modeling Week-4 Assignment Cancer Disease Prediction
No ratings yet
Meaningful Predictive Modeling Week-4 Assignment Cancer Disease Prediction
6 pages
Machine Learning LAB
No ratings yet
Machine Learning LAB
20 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
ML practical Kunal 6-10
No ratings yet
ML practical Kunal 6-10
10 pages
Codes for Project
No ratings yet
Codes for Project
8 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
Flower Category Analysis - Ipynb - Colab
No ratings yet
Flower Category Analysis - Ipynb - Colab
2 pages
Deep Learning Techniques
No ratings yet
Deep Learning Techniques
9 pages
Classification Is For Predicting Type and Regression Is For Predicting Value
No ratings yet
Classification Is For Predicting Type and Regression Is For Predicting Value
4 pages
21CSC305P Ml - Lab Programs 1 -9
No ratings yet
21CSC305P Ml - Lab Programs 1 -9
36 pages
UNIT 3 AAM
No ratings yet
UNIT 3 AAM
30 pages
ML Lab Programs (1)
No ratings yet
ML Lab Programs (1)
9 pages
machine learning final manual
No ratings yet
machine learning final manual
45 pages
NNDL_RECORD_MANUAL
No ratings yet
NNDL_RECORD_MANUAL
36 pages
Maxbox - Starter67 Machine Learning
No ratings yet
Maxbox - Starter67 Machine Learning
7 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
ML Algorithms Python
No ratings yet
ML Algorithms Python
4 pages
Lab 8
No ratings yet
Lab 8
8 pages
Experiment 8&9
No ratings yet
Experiment 8&9
3 pages
CNN TF Keras
No ratings yet
CNN TF Keras
6 pages
Vtu ML
No ratings yet
Vtu ML
13 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
® Iit-Jee: 2024-25 Enthusiast Course Phase-I (A), I (B), I & Ii
No ratings yet
® Iit-Jee: 2024-25 Enthusiast Course Phase-I (A), I (B), I & Ii
1 page
Mathematical Model of Systems:: Rotational Mechanical System Transfer Function
No ratings yet
Mathematical Model of Systems:: Rotational Mechanical System Transfer Function
13 pages
ANS Hw6
No ratings yet
ANS Hw6
7 pages
Illustrated Glossary of Geographic Terms
No ratings yet
Illustrated Glossary of Geographic Terms
12 pages
M16 Petrophysical Modeling 2007
No ratings yet
M16 Petrophysical Modeling 2007
11 pages
Mega Grand Test 4 New
No ratings yet
Mega Grand Test 4 New
32 pages
BUSINESS STATISTICS NOTES COEC 1210
No ratings yet
BUSINESS STATISTICS NOTES COEC 1210
44 pages
12.3 Erratic Motion
No ratings yet
12.3 Erratic Motion
25 pages
MAT9004 Lecture Outline
No ratings yet
MAT9004 Lecture Outline
4 pages
Were Talkin Algebra Tom Lehrer
No ratings yet
Were Talkin Algebra Tom Lehrer
2 pages
The Bounded Convergence Theorem - Brian Thomson
No ratings yet
The Bounded Convergence Theorem - Brian Thomson
22 pages
Rotation Work Sheets PDF
No ratings yet
Rotation Work Sheets PDF
20 pages
SPM Physics Paper 3
No ratings yet
SPM Physics Paper 3
10 pages
B.tech ECE 2022 2023 Syllabus Scheme
No ratings yet
B.tech ECE 2022 2023 Syllabus Scheme
67 pages
Statistical Techniques Lab (BCSL-044)
No ratings yet
Statistical Techniques Lab (BCSL-044)
3 pages
Ergonomic Design and Analysis of A Post in A Stall: Article Information
No ratings yet
Ergonomic Design and Analysis of A Post in A Stall: Article Information
10 pages
Collaboration and Lms Matrix
No ratings yet
Collaboration and Lms Matrix
5 pages
CodingGita Entrance Test Guide
No ratings yet
CodingGita Entrance Test Guide
1 page
5.5.a.a CADModelFeatures
100% (2)
5.5.a.a CADModelFeatures
3 pages
Thornton Et Al. - 2019 - Developing Athlete Monitoring Systems in Team Spor
No ratings yet
Thornton Et Al. - 2019 - Developing Athlete Monitoring Systems in Team Spor
27 pages
CLAI Syllabus
No ratings yet
CLAI Syllabus
3 pages
Research Hypothesis
No ratings yet
Research Hypothesis
22 pages
13 Red-Black Trees
No ratings yet
13 Red-Black Trees
32 pages
Quadratic Equation Previous Year Question - 01 (1) (9 Files Merged)
No ratings yet
Quadratic Equation Previous Year Question - 01 (1) (9 Files Merged)
37 pages
Circular Motion, 2D Motion, Mechanics Revision Notes From A-Level Maths Tutor
100% (2)
Circular Motion, 2D Motion, Mechanics Revision Notes From A-Level Maths Tutor
6 pages
Micro PS1 Fall 2019 Sol Key
No ratings yet
Micro PS1 Fall 2019 Sol Key
7 pages
Numerical Ability 1
No ratings yet
Numerical Ability 1
3 pages
4 Ideal Models of Engine Cycles
No ratings yet
4 Ideal Models of Engine Cycles
23 pages