Tutorial 6

The document discusses using n-fold cross validation to determine the best complexity parameter value for decision tree classification. It describes splitting a bank customer dataset into 10 folds, fitting decision trees with varying cp values, and calculating the misclassification error rate to find the best cp. Plots of the error rates and the decision tree with the best cp are also described.

Uploaded by

Low Jia Hui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Tutorial 6

Uploaded by

Low Jia Hui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Tutorial 6

DSA1101
Introduction to Data Science
October 12, 2018

Exercise 1. n-fold cross validation for decision trees

Recall that we studied n-fold cross-validation for the k-nearest neighbor
classifier, in which the value of k is varied to control the complexity of the
decision surface for the classifier. For decision tree classification, a similar
complexity parameter exists, which is denoted as Cp . Heuristically, smaller
values of Cp correspond to decision trees of larger sizes, and hence more
complex decision surfaces. For this week’s tutorial, we will investigate n-fold
cross validation for a decision tree classifier.

(a) Consider the dataset ‘bank-sample.csv’ we discussed in the lectures. For

this exercise, we will fit a decision tree with subscribed as outcome
and job, marital, education, default. housing, loan, contact and
poutcome as feature variables. We want to find the best Cp value in
terms of misclassification error rate.
1. Randomly split the entire dataset into 10 mutually exclusive datasets
2. Let Cp take on the values 10k for k = −5, −4, −3, ..., 0, ..., 3, 4, 5.
3. At each Cp value, run the following loop for j = 1, 2, ..., 10 :
(a) Set the j th group to be the test set
(b) Fit a decision tree on the other 9 sets with the value of Cp
(c) Predict the class assignment of subscribed for each observation
of the test set
(d) Calculate the number of misclassification(s) by comparing pre-
dicted versus actual class labels in the test set
4. Determine the best Cp value in terms of misclassification error rate.

1
1 library ( " rpart " )
2 library ( " rpart . plot " )
3
4
5 # CV for decision tree
6
7 banktrain <- read . table ( " bank - sample . csv " , header = TRUE , sep = " ," )
8
9 # # drop a few columns to simplify the tree
10 drops <-c ( " age " , " balance " , " day " , " campaign " ,
11 " pdays " , " previous " , " month " , " duration " )
12 banktrain <- banktrain [ , ! ( names ( banktrain ) % in % drops ) ]
13
14 # # total records in dataset
15 n = dim ( banktrain ) [1]
16
17 # # Randomly split into 10 datasets
18 # # We have seen this code before
19 n _ folds =10
20 folds _ j <- sample ( rep (1: n _ folds , length . out = n ) )
21 table ( folds _ j )
22
23 cp =10^( -5:5)
24 misC = rep (0 , length ( cp ) )
25
26 for ( i in 1: length ( cp ) ) {
27
28 misclass =0
29 for ( j in 1: n _ folds ) {
30 test <- which ( folds _ j == j )
31 train = banktrain [ - c ( test ) ,]
32 fit <- rpart ( subscribed ~ job + marital +
33 education + default + housing +
34 loan + contact + poutcome ,
35 method = " class " ,
36 data = train ,
37 control = rpart . control ( cp = cp [ i ]) ,
38 parms = list ( split = ’ information ’) )
39
40 new . data = data . frame ( banktrain [ test , c (1:8) ])
41 # # predict label for test data based on fitted tree
42 prd = predict ( fit , new . data , type = ’ class ’)
43 misclass = misclass + sum ( prd ! = banktrain [ test ,9])
44 }
45 misC [ i ]= misclass / n
46 }
47

48 plot ( log ( cp , base =10) , misC , type = ’b ’)

2
(b) Plot the decision tree fitted with the best Cp value in terms of misclas-
sification rate
1 # # determine the best cp in terms of
2 # # miscl assifi cation rate
3
4 best . cp = cp [ which ( misC == min ( misC ) ) ]
5
6 # # Fit decision tree
7 fit <- rpart ( subscribed ~ job + marital +
8 education + default + housing +
9 loan + contact + poutcome ,
10 method = " class " ,
11 data = train ,
12 control = rpart . control ( cp = best . cp ) ,
13 parms = list ( split = ’ information ’) )
14
15 # # Plot the tree
16 rpart . plot ( fit , type =4 , extra =2)

Solution 3.1
No ratings yet
Solution 3.1
4 pages
Problem: # Partition
No ratings yet
Problem: # Partition
5 pages
Lab Practical No 1 Synthesis of Magnesium Oxide 2017
No ratings yet
Lab Practical No 1 Synthesis of Magnesium Oxide 2017
9 pages
CIS 520, Machine Learning, Fall 2015: Assignment 2 Due: Friday, September 18th, 11:59pm (Via Turnin)
No ratings yet
CIS 520, Machine Learning, Fall 2015: Assignment 2 Due: Friday, September 18th, 11:59pm (Via Turnin)
3 pages
National Institute of Technology Rourkela: Department of Computer Science and Engineering
No ratings yet
National Institute of Technology Rourkela: Department of Computer Science and Engineering
2 pages
Hruby Ondrej Hw3
No ratings yet
Hruby Ondrej Hw3
18 pages
Tree-Based-Methods
No ratings yet
Tree-Based-Methods
21 pages
R Assignment
No ratings yet
R Assignment
8 pages
Analysis Course HW2
No ratings yet
Analysis Course HW2
13 pages
Assignment AnjaliVats 244
No ratings yet
Assignment AnjaliVats 244
12 pages
ISYE6501-Homework-2
No ratings yet
ISYE6501-Homework-2
11 pages
Janani Prakash Loan Prediction Study
No ratings yet
Janani Prakash Loan Prediction Study
97 pages
Week-2 NK
No ratings yet
Week-2 NK
12 pages
08 Tree Classification
No ratings yet
08 Tree Classification
22 pages
P02 DecisionTrees SolutionNotes
No ratings yet
P02 DecisionTrees SolutionNotes
3 pages
DA_Lab_Week-3 (1)
No ratings yet
DA_Lab_Week-3 (1)
15 pages
ISYE 6501 Georgia Tech Hmwk3.1a
No ratings yet
ISYE 6501 Georgia Tech Hmwk3.1a
4 pages
Assignment-1 of Machine Learning On Decision Tree: Submitted To: Submitted by
No ratings yet
Assignment-1 of Machine Learning On Decision Tree: Submitted To: Submitted by
13 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Classification Error: Training Errors Generalization Errors
No ratings yet
Classification Error: Training Errors Generalization Errors
39 pages
IDS 575 Assignment - 3: Name: Swapnil Shashank Parkhe UIN: 660014865
No ratings yet
IDS 575 Assignment - 3: Name: Swapnil Shashank Parkhe UIN: 660014865
7 pages
Classification Problems
No ratings yet
Classification Problems
53 pages
UCS622
No ratings yet
UCS622
1 page
Part I
No ratings yet
Part I
12 pages
ML_4,5 (1)
No ratings yet
ML_4,5 (1)
5 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
04-Classification & Tunning - Copie PDF
No ratings yet
04-Classification & Tunning - Copie PDF
54 pages
UNIT III MACHINE LEARNING
No ratings yet
UNIT III MACHINE LEARNING
19 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
P02 DecisionTrees
No ratings yet
P02 DecisionTrees
2 pages
AIH_Lab2
No ratings yet
AIH_Lab2
10 pages
Assingment On Database
No ratings yet
Assingment On Database
16 pages
1.10. Decision Trees — scikit-learn 0.24.1 documentation
No ratings yet
1.10. Decision Trees — scikit-learn 0.24.1 documentation
10 pages
Cross Validation
No ratings yet
Cross Validation
14 pages
TD1 ELTP 2023 Correction
No ratings yet
TD1 ELTP 2023 Correction
6 pages
CSC411 Tutorial #3 Cross-Validation and Decision Trees: February 3, 2016 Boris Ivanovic Csc411ta@cs - Toronto.edu
No ratings yet
CSC411 Tutorial #3 Cross-Validation and Decision Trees: February 3, 2016 Boris Ivanovic Csc411ta@cs - Toronto.edu
24 pages
ANACONDA EX-7
No ratings yet
ANACONDA EX-7
3 pages
Guide
No ratings yet
Guide
24 pages
CONTENTS
No ratings yet
CONTENTS
7 pages
Discussion 3 Supervised
No ratings yet
Discussion 3 Supervised
14 pages
Results
No ratings yet
Results
4 pages
Decision - Tree Using R
No ratings yet
Decision - Tree Using R
13 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
9 pages
Classification: Basic Concepts, Decision Trees, and Model Evaluation
No ratings yet
Classification: Basic Concepts, Decision Trees, and Model Evaluation
46 pages
Ass3 v1
No ratings yet
Ass3 v1
4 pages
IE 451 Fall 2023-2024 Homework 7 Solutions
No ratings yet
IE 451 Fall 2023-2024 Homework 7 Solutions
11 pages
Random Forest
No ratings yet
Random Forest
25 pages
Team 5
No ratings yet
Team 5
12 pages
Final Project
No ratings yet
Final Project
9 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
p3 Assesslearners Report
No ratings yet
p3 Assesslearners Report
8 pages
Solution 1
No ratings yet
Solution 1
6 pages
Chenhao_HW1
No ratings yet
Chenhao_HW1
5 pages
Assignment 9 solution
No ratings yet
Assignment 9 solution
4 pages
DT RF
No ratings yet
DT RF
7 pages
Homework4DecisionTree Answers Vs1
100% (1)
Homework4DecisionTree Answers Vs1
5 pages
Vivek Sharma 2k21 Cs 111
No ratings yet
Vivek Sharma 2k21 Cs 111
48 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
4 Key Aspects To Variography
100% (2)
4 Key Aspects To Variography
14 pages
MCQ On Data Mining With Answers Set-1
No ratings yet
MCQ On Data Mining With Answers Set-1
11 pages
Artificial Intelligence in Todays Education Lands
No ratings yet
Artificial Intelligence in Todays Education Lands
86 pages
6 convertingNCEAtoAustralianentryscores
No ratings yet
6 convertingNCEAtoAustralianentryscores
1 page
Online Admission System Is Aimed at Developing An Online Admission Application
No ratings yet
Online Admission System Is Aimed at Developing An Online Admission Application
24 pages
Final Project
No ratings yet
Final Project
72 pages
Deep Learning Based Campus Placement Prediction
No ratings yet
Deep Learning Based Campus Placement Prediction
19 pages
Introductory Econometrics Midterm Examnation INSTRUCTIONS: - This Is The Open-Book Exam
No ratings yet
Introductory Econometrics Midterm Examnation INSTRUCTIONS: - This Is The Open-Book Exam
2 pages
2025 Graduate Impact Leadership Program Summer Decision Scientist
No ratings yet
2025 Graduate Impact Leadership Program Summer Decision Scientist
3 pages
Mace 301 Mid Sem Paper
No ratings yet
Mace 301 Mid Sem Paper
2 pages
The Teacher Technology Integration Experience PDF
No ratings yet
The Teacher Technology Integration Experience PDF
18 pages
Final Draft UCSP
No ratings yet
Final Draft UCSP
47 pages
Big Data Analytics Syllabus
No ratings yet
Big Data Analytics Syllabus
2 pages
13SCEC Cloud Compare Final
No ratings yet
13SCEC Cloud Compare Final
13 pages
Why Learning Data Analysis Is Essential
No ratings yet
Why Learning Data Analysis Is Essential
2 pages
QMB12 CH 6 A
No ratings yet
QMB12 CH 6 A
56 pages
_Busse_handout
No ratings yet
_Busse_handout
10 pages
COM 216
No ratings yet
COM 216
8 pages
Kubsa Guyo Advance Biostatistic
No ratings yet
Kubsa Guyo Advance Biostatistic
30 pages
Chapter 4: of Tests and Testing 12 Assumptions in Psychological Testing and Assessment
No ratings yet
Chapter 4: of Tests and Testing 12 Assumptions in Psychological Testing and Assessment
5 pages
(eBook PDF) Business Analytics 4th Edition by Jeffrey D. Camm - Instantly access the full ebook content in just a few seconds
100% (3)
(eBook PDF) Business Analytics 4th Edition by Jeffrey D. Camm - Instantly access the full ebook content in just a few seconds
59 pages
HU14 CISC 520 Deliverable 2 Data Exploration and Mining Methods Proposal KA1 80%3 (1)
No ratings yet
HU14 CISC 520 Deliverable 2 Data Exploration and Mining Methods Proposal KA1 80%3 (1)
6 pages
Airlangga University - The Transformation of Education in Indonesia From The Colonial Era To The Digital Era
No ratings yet
Airlangga University - The Transformation of Education in Indonesia From The Colonial Era To The Digital Era
22 pages
8510
No ratings yet
8510
9 pages
Meteorology Thesis Topics
75% (4)
Meteorology Thesis Topics
5 pages
BRM Presentation
No ratings yet
BRM Presentation
25 pages
Thesis 1207 Revise
No ratings yet
Thesis 1207 Revise
44 pages
DWM TE QP
No ratings yet
DWM TE QP
7 pages
Book Research Methodology
100% (1)
Book Research Methodology
28 pages