Cs4407 Programming Assignment 5
Cs4407 Programming Assignment 5
Data Mining and Machine Learning (proctored course) (University of the People)
Assignment Unit 5
1
Downloaded by Steve Adrien (killerb733@gmail.com)
For the Unit 5 Programming Assignment, follow the instructions for the lab in James (2013)
textbook in section 8.3. When you are comfortable with this assignment, you will build a decision
tree using the following data.
Data Set Information:
This radar data was collected by a system in Goose Bay, Labrador. This system consists of a phased
array of 16 high-frequency antennas with a total transmitted power on the order of 6.4 kilowatts. See
the paper for more details. The targets were free electrons in the ionosphere.
"Good" radar returns are those showing evidence of some type of structure in the ionosphere.
"Bad" returns are those that do not; their signals pass through the ionosphere.
Received signals were processed using an autocorrelation function whose arguments are the time of
a pulse and the pulse number. There were 17 pulse numbers for the Goose Bay system. Instances in
this database are described by 2 attributes per pulse number, corresponding to the complex values
returned by the function resulting from the complex electromagnetic signal.
Attribute Information:
All 34 are continuous
The 35th attribute is either "good" or "bad" according to the definition summarized above.
This is a binary classification exercise
Download the data set:
https://my.uopeople.edu/pluginfile.php/295432/mod_workshop/instructauthors/Ionosphere.txt
This assignment follows the programming lab in section 8.3 of the textbook closely. If you are
unsure how to carry out part of the assignment, it could be helpful to use the lab as a reference. It
might also be helpful to refer to the manual for the rpart package: https://cran.r-
project.org/web/packages/rpart/rpart.pdf
Part 1: Print decision tree
a. We begin by setting the working directory, loading the required packages (rpart and mlbench)
and then loading the Ionosphere dataset.
#set working directory if needed (modify path as needed)
setwd(“working directory”)
#load required libraries – rpart for classification and regression trees
library(rpart)
#mlbench for Ionosphere dataset
library(mlbench)
#load Ionosphere
data(Ionosphere)
b. Use the rpart() method to create a regression tree for the data.
rpart(Class~.,Ionosphere)
c. Use the plot() and text() methods to plot the decision tree.
Part 2: Estimate accuracy
a. Split the data a test and train subsets using the sample() method.
b. Use the rpart method to create a decision tree using the training data.
rpart(Class~.,Ionosphere,subset=train)
c. Use the predict method to find the predicted class labels for the testing data.
d. Use the table method to create a table of the predictions versus true labels and then compute the
accuracy. The accuracy is the number of correctly assigned good cases (true positives) plus the
number of correctly assigned bad cases (true negatives) divided by the total number of testing
cases.
References:
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning
bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf
project.org/doc/manuals/R-intro.pdf