0% found this document useful (0 votes)

67 views

Random Forest Reference Code

The document discusses random forest classification models. It shows that a random forest model was built with 500 trees using 3 variables for each tree. The out-of-bag error estimate provides an accurate assessment of the model's performance without a test set. Variable importance is assessed using a variable importance plot that sorts variables by their MeanDecreaseGini. The random forest model achieves 99% accuracy on both the training and test sets, indicating stability.

Uploaded by

Rajat Shetty

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views

Random Forest Reference Code

Uploaded by

Rajat Shetty

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Classification

Random Forest

www.proschoolonline.com
Random Forest
#Random Forest model
modelrf <- randomForest(as.factor(left) ~ . , data = trainSplit, do.trace=T)
modelrf

The random forest model output tells us that it has built 500 trees and used 3 variables for each tree building.
Out-of-bag estimate for the generalization error is the error rate of the out-of-bag classifier on the training set.
The OOB estimate is as accurate as using a test set of the same size as the training set. Therefore, using the out-
of-bag error estimate removes the need for a set aside test set.

www.proschoolonline.com
Random Forest
#Checking variable importance in Random Forest
importance(modelrf)

varImpPlot(modelrf)

The variable importance plot displays

a plot with variables sorted by
MeanDecreaseGini

www.proschoolonline.com
Random Forest
# Prediction and Model Evaluation using Confusion Matrix
predrf_tr <- predict(modelrf, trainSplit) #Train Data
predrf_test <- predict(modelrf, testSplit) #Test Data

confusionMatrix(predrf_tr,trainSplit$left) #Train Data

confusionMatrix(predrf_test,testSplit$left) #TestData

The Confusion Matrix The Confusion Matrix

on Train data gives on Train data gives
the accuracy of 99% the accuracy of 99%

As we observe, the model shows similar performance on Train and Test data and hence we assure stability of our Random Forest model
www.proschoolonline.com
Comparing ROC curves for Decision Tree and Random Forest

# Prediction and Model Evaluation using Confusion Matrix

#Decision Tree ROC
auc1 <- roc(as.numeric(testSplit$left),
as.numeric(predtest))
plot(auc1,col =
'blue',main=paste('AUC:',round(auc1$auc[[1]],3)))

#Random Forest ROC

aucrf <- roc(as.numeric(testSplit$left),
as.numeric(predrf), ci=TRUE)
plot(aucrf, ylim=c(0,1), print.thres=TRUE,
main=paste('Random Forest
AUC:',round(aucrf$auc[[1]],3)),col = 'blue')

#Comparing both ROC curves

plot(aucrf, ylim=c(0,1), main=paste('ROC Comparison :
RF(blue),C5.0(Black))'),col = 'blue')
par(new = TRUE)
plot(auc1)
par(new = TRUE) The ROC curve for Random Forest is better for
Decision Tree.

www.proschoolonline.com
Classification Model
Naïve Bayes

www.proschoolonline.com
Naïve Bayes
#Naive Bayes
modelnb <- naiveBayes(as.factor(left) ~. , data = trainSplit)
modelnb

These are the apriori probabilities for the variables in the dataset
www.proschoolonline.com
Naïve Bayes
#Performance of Naïve Bayes using Confusion Matrix
prednb_tr <- predict(modelnb,trainSplit) #Train Data
prednb_test <- predict(modelnb,testSplit) #Test Data

confusionMatrix(prednb_tr,trainSplit$left) #Train Data

confusionMatrix(prednb_test,testSplit$left) #Test Data

The Confusion Matrix

The Confusion Matrix
on Train data gives
on Train data gives
the accuracy of
the accuracy of
78.84%
78.58%

As we observe, the model shows similar performance on Train and Test data and hence we assure stability of our Naïve Bayes model

www.proschoolonline.com
Classification Model
kNN Algorithm

www.proschoolonline.com
kNN Algorithm
#Data Preparation for kNN Algorithm

library(dummies)
#Creating dummy variables for Factor variable
dummy_df = dummy.data.frame(hr_data1[, c('role_code', 'salary.code')])

hr_data2 = hr_data1
hr_data2 = cbind.data.frame(hr_data2, dummy_df)

#Removing role_code and salary.code since we have created dummy variables

hr_data2 = hr_data2[, !(names(hr_data2) %in% c('role_code', 'salary.code'))]

#Converting variables to numeric datatype

hr_data2$Work_accident = as.numeric(hr_data2$Work_accident)
hr_data2$promotion_last_5years = as.numeric(hr_data2$promotion_last_5years)

www.proschoolonline.com
kNN Algorithm
#Data Preparation for kNN Algorithm

#Scale the variables and check their final

structure
X = hr_data2[, !(names(hr_data2) %in% c('left'))]
hr_data2_scaled = as.data.frame(scale(X))

str(hr_data2_scaled)

#Splitting the data for the model building

hr_train <- hr_data2_scaled[splitIndex,]
hr_test <- hr_data2_scaled[-splitIndex,]

hr_train_labels <- hr_data2[splitIndex, 'left']

hr_test_labels <- hr_data2[-splitIndex, 'left']

www.proschoolonline.com
kNN Algorithm
#Applying kNN Algorithm on the dataset
library(class)
library(gmodels)

test_pred_1 <- knn(train = hr_train, test = hr_test, cl = hr_train_labels, k=1)

CrossTable(x=hr_test_labels ,y=test_pred_1 ,prop.chisq = FALSE)

Here from this crosstab, we can compute the accuracy

of this model, for k = 1.

Accuracy = (TP+TN)/Total
= (3311+1030)/4499
= 96.48%

www.proschoolonline.com
kNN Algorithm
#Applying kNN Algorithm on the dataset

As we calculated for k = 1, Similarly we will calculate it for k = 5,10,50,100,122.

Below we summarize the accuracy for these k values

K Accuracy
5 94.46%
10 94.17%
50 90.19%
100 86.48%
122 85.06%

From the above accuracy table, we can observe that as the k value increases the accuracy goes
down.

www.proschoolonline.com
kNN Algorithm
# Thumb rule to decide on k for k-NN is sqrt(n)/2
k = sqrt(nrow(hr_train))/2
k
#51.2347 (which can be approximated to 51

test_pred_rule <- knn(train = hr_train, test = hr_test, cl = hr_train_labels, k=k)

CrossTable(x=hr_test_labels ,y=test_pred_rule ,prop.chisq = FALSE)
# accuracy = 4050/4499 = 90.02%

# Another method to determine the k for k-NN

set.seed(400)
ct <- trainControl(method="repeatedcv",repeats = 3)
fit <- train(left ~ ., data = hr_data2, method = "knn", trControl = ct, preProcess =
c("center","scale"),tuneLength = 20)
fit

# Checking accuracy of the model with k = 7

test_pred_7 <- knn(train = hr_train, test = hr_test, cl = hr_train_labels, k=7)
CrossTable(x=hr_test_labels ,y=test_pred_7 ,prop.chisq = FALSE)
# accuracy = 4357/4499 = 96.84%

#or alternatively we can use this below command

confusionMatrix (hr_test_labels,test_pred_7)

Output on next slide..

www.proschoolonline.com
Using the above code it indicates that
k=7 is the best for this data and it is
better to go with this value because it
has been cross validated

www.proschoolonline.com
Step 6
Model
Summarization

www.proschoolonline.com
Summary of Model Performance

Model Accuracy
Decision Tree 97.09%
Random Forest 99%
Naïve Bayes 78.84%
kNN Algorithm (Using k = 7) 96.84%

www.proschoolonline.com
Appendix
Packages used for the Classification Analysis:

•data.table
•reshape2
•randomForest
•party # For decision tree
•rpart # for Rpart
•rpart.plot #for Rpart plot
•lattice # Used for Data Visualization
•caret # for data pre-processing
•pROC # for ROC curve
•corrplot # for correlation plot
•e1071 # for ROC curve and Confusion matrix
•RColorBrewer
•dummies
•class
•gmodels

www.proschoolonline.com
Thank You.

Solid Starts - First 100 Days
94% (18)
Solid Starts - First 100 Days
287 pages
Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
The Hold Me Tight Workbook - Dr. Sue Johnson
100% (16)
The Hold Me Tight Workbook - Dr. Sue Johnson
187 pages
Read People Like A Book by Patrick King-Edited
62% (66)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
94% (212)
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
212 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (28)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
75% (12)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
36 Questions To Fall in Love 1
97% (31)
36 Questions To Fall in Love 1
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
71% (69)
1001 Songs
1,798 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Solution PDF
No ratings yet
Solution PDF
20 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
5 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
16 pages
Lecture 3 - Introduction To Linear Algebra, Probability and Statistics (DONE!!)
No ratings yet
Lecture 3 - Introduction To Linear Algebra, Probability and Statistics (DONE!!)
41 pages
ARM Cortex-A57 Block Diagram
No ratings yet
ARM Cortex-A57 Block Diagram
1 page
MLOPs Original
No ratings yet
MLOPs Original
27 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
58 pages
1 ARM Introduction
No ratings yet
1 ARM Introduction
87 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
58 pages
Answer 1722791857 NLP and Classification Practical MCQ 4991
No ratings yet
Answer 1722791857 NLP and Classification Practical MCQ 4991
26 pages
Arithmetic Coding
No ratings yet
Arithmetic Coding
12 pages
Predictive Coding I
No ratings yet
Predictive Coding I
14 pages
Circular Convolution On Matlab
No ratings yet
Circular Convolution On Matlab
5 pages
Train A Simple NN - Jupyter Notebook
No ratings yet
Train A Simple NN - Jupyter Notebook
4 pages
Arm
100% (1)
Arm
43 pages
KSC2016 - Recurrent Neural Networks
No ratings yet
KSC2016 - Recurrent Neural Networks
66 pages
Wireless Personal Communications Systems - CSE5807: School of Computer Science and Software Engineering
No ratings yet
Wireless Personal Communications Systems - CSE5807: School of Computer Science and Software Engineering
32 pages
Embedded Systems
No ratings yet
Embedded Systems
2 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Revision On MATLAB & Image Processing With Matlab
No ratings yet
Revision On MATLAB & Image Processing With Matlab
36 pages
50 Most Important CNN Interview Questions
No ratings yet
50 Most Important CNN Interview Questions
18 pages
SRT Div
No ratings yet
SRT Div
8 pages
WFQ (Weighted Fair Queuing)
No ratings yet
WFQ (Weighted Fair Queuing)
4 pages
CS263 - Bayesian Decision Theory
No ratings yet
CS263 - Bayesian Decision Theory
16 pages
Computer Vision Notes: Confirmed Midterm Exam Guide (Kisi-Kisi UTS)
No ratings yet
Computer Vision Notes: Confirmed Midterm Exam Guide (Kisi-Kisi UTS)
24 pages
Lecture 2 - ARM Instruction Set
No ratings yet
Lecture 2 - ARM Instruction Set
42 pages
Vlsi Implementation For High Speed Adders
100% (1)
Vlsi Implementation For High Speed Adders
6 pages
Unit 2a
No ratings yet
Unit 2a
31 pages
Random Forest
No ratings yet
Random Forest
16 pages
Modelling, Simulation, and Implementing ROS For Autonomous Navigation of Tracked Robot
No ratings yet
Modelling, Simulation, and Implementing ROS For Autonomous Navigation of Tracked Robot
13 pages
OS Lecture3 - Inter Process Communication
No ratings yet
OS Lecture3 - Inter Process Communication
43 pages
C6713 DSP Lab Mannual 2
No ratings yet
C6713 DSP Lab Mannual 2
40 pages
CS230 Midterm Solutions Fall 2022
No ratings yet
CS230 Midterm Solutions Fall 2022
20 pages
Time Series Analysis
No ratings yet
Time Series Analysis
3 pages
Implementation
No ratings yet
Implementation
14 pages
Arm Instructions
No ratings yet
Arm Instructions
24 pages
Brain Tumor Classification
100% (1)
Brain Tumor Classification
12 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
PDF Introduction To Machine Learning Wit PDF
0% (1)
PDF Introduction To Machine Learning Wit PDF
3 pages
MATLAB Codes (CNN, LSTM)
100% (1)
MATLAB Codes (CNN, LSTM)
7 pages
Bioinformatics F&amp M 20100722 Bujak
100% (1)
Bioinformatics F&amp M 20100722 Bujak
27 pages
L10 - Intro - To - Deep - Learning
No ratings yet
L10 - Intro - To - Deep - Learning
75 pages
Unit-4 Mwoc 5-12-22
No ratings yet
Unit-4 Mwoc 5-12-22
82 pages
Machine Learning Approachs (AI)
100% (1)
Machine Learning Approachs (AI)
11 pages
Machine Learning Mini-Project Report
No ratings yet
Machine Learning Mini-Project Report
26 pages
SAR ADCs vs. Delta-Sigma ADCs - Different Architectures For Different Application Needs
No ratings yet
SAR ADCs vs. Delta-Sigma ADCs - Different Architectures For Different Application Needs
37 pages
Quiz Week 7 - Support Vector Machines
100% (1)
Quiz Week 7 - Support Vector Machines
3 pages
C 2 OneFactor Vasicek
No ratings yet
C 2 OneFactor Vasicek
21 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
Final Vls I Project Report
No ratings yet
Final Vls I Project Report
5 pages
Strategic Approach To Software Testing
No ratings yet
Strategic Approach To Software Testing
6 pages
ML - Chapter 6 - Model Evaluation
No ratings yet
ML - Chapter 6 - Model Evaluation
65 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Chapter10 Keras
No ratings yet
Chapter10 Keras
66 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
BVH
No ratings yet
BVH
2 pages
Implementing KNN Algorithm: Importing Libraries
No ratings yet
Implementing KNN Algorithm: Importing Libraries
6 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Strategic Management: Prof Bharat Nadkarni
No ratings yet
Strategic Management: Prof Bharat Nadkarni
40 pages
GE Matrix
No ratings yet
GE Matrix
13 pages
Strategic Management: Business Strategies and Corporate Strategies
No ratings yet
Strategic Management: Business Strategies and Corporate Strategies
22 pages
Strategic Management: Prof Bharat Nadkarni
No ratings yet
Strategic Management: Prof Bharat Nadkarni
34 pages
Supply Chain Management
No ratings yet
Supply Chain Management
4 pages
Strategic MGMT Mintzberg McKinsey Blue Red Ocean
No ratings yet
Strategic MGMT Mintzberg McKinsey Blue Red Ocean
26 pages
Models in Strategic Management
No ratings yet
Models in Strategic Management
110 pages
Globalisation Strategies Prof Bharat Nadkarni
No ratings yet
Globalisation Strategies Prof Bharat Nadkarni
15 pages
Decision Making: Prof Bharat Nadkarni
No ratings yet
Decision Making: Prof Bharat Nadkarni
38 pages
Dell Case Study
0% (1)
Dell Case Study
2 pages
Supply Chain Management Notes
No ratings yet
Supply Chain Management Notes
6 pages
L 4 Strategy Types and Choices
100% (2)
L 4 Strategy Types and Choices
118 pages
Characteristics of Mass Production
No ratings yet
Characteristics of Mass Production
7 pages
L 2 Vision Mission Goals Etc
No ratings yet
L 2 Vision Mission Goals Etc
88 pages
BDA Course Structure
No ratings yet
BDA Course Structure
3 pages
L 3 Strategic Planning, Concepts, Operational Planning Etc
No ratings yet
L 3 Strategic Planning, Concepts, Operational Planning Etc
56 pages
L3 Yield Management, Overbooking Etc
No ratings yet
L3 Yield Management, Overbooking Etc
58 pages
L5 Offshoring and Outsourcing - Risk, Capabilities Etc
No ratings yet
L5 Offshoring and Outsourcing - Risk, Capabilities Etc
68 pages
L4 Manufacturing Resource Planning - Ii
No ratings yet
L4 Manufacturing Resource Planning - Ii
74 pages
L8 MNC Organisational Structure
No ratings yet
L8 MNC Organisational Structure
44 pages
31st March20 Circular
No ratings yet
31st March20 Circular
4 pages
L5 MRP - Ii, Jit and MRP Ii
No ratings yet
L5 MRP - Ii, Jit and MRP Ii
48 pages
Circular:: Copy For Necessary Action
No ratings yet
Circular:: Copy For Necessary Action
4 pages
L7 Foreign Direct Investment
No ratings yet
L7 Foreign Direct Investment
19 pages
L5 International HRM
No ratings yet
L5 International HRM
23 pages
International Business (Jaimen
No ratings yet
International Business (Jaimen
37 pages
Basics of Statistics - 2
No ratings yet
Basics of Statistics - 2
77 pages
Introduction To Robotics 2
No ratings yet
Introduction To Robotics 2
16 pages
IM Ch03 Relational Model Char IntEd Solutions
No ratings yet
IM Ch03 Relational Model Char IntEd Solutions
30 pages
Module 2
No ratings yet
Module 2
55 pages