CSE 455/555 Spring 2012 Homework 1: Bayes ∗ ω
CSE 455/555 Spring 2012 Homework 1: Bayes ∗ ω
CSE 455/555 Spring 2012 Homework 1: Bayes ∗ ω
Jason J. Corso
Computer Science and Engineering
SUNY at Buffalo
jcorso@buffalo.edu
Date Assigned 31 Jan 2012
Date Due 27 Feb 2012
Homework must be submitted by midnight of the due-date, electronically (see below). No late work will be accepted.
Remember, you are permitted to discuss this assignment with other students in the class (and not in the class), but
you must write up your own work from scratch.
I am sure the answers to some or all of these questions can be found on the internet. Copying from any another
source is indeed cheating. Obviously, it will undermine the primary purpose you are taking this course: to learn.
This class has a zero tolerance policy toward cheaters and cheating. Dont do it.
We also define another two decision rules to maximize the prior and likelihood, respectively:
p (x) = p = argmax p()
2. (5%) Next, implement the multiclass variance impurity and retrain the decision tree. Compute the accuracy.
Describe what has changed and also submit the py file with the impurity function.
3. (35%) Next, you are required to implement a Random Forest classifier for classifying grayscale images of
handwritten digits. For simplicity, I have provided a rfdigits hw1.py starter file that contains the full
skeleton of the random forest code (both learning and classification) as well as the functionality to load in the
data and compute features on it.
You are required to implement the missing parts of the code as specified in the file. Two of the parts are for
the classification side and one is for the learning side. You need to do all three.
Just like in the Amit and Geman digits example given in class, the features that have been defined in this code
are too many to enumerate. So, during the randomized learning of the tree, you will have to dynamically
instantiate a random set of, say 100, features and learn the best query based on them. Then, at the next node,
you instantiate a new random set of features and so on. The actual features that we will use are simple weighted
sums of pixel values (all of the images are 28 28 grayscale). The class KSum Feature has been provided
for you to easily compute these features. Finally, the class DTQuery KSum has been provided to encapsulate
a query based on these features; you do not need to use this one if you dont want, but it is recommended.
You need to complete the rfdigits hw1.py file, and execute it (by typing, for example,
python rfdigits hw1.py assuming your PYTHONPATH is set properly. Running this script will load a
subset of the data, train the tree, and then compute the accuracy on the training set as well as the accuracy on
the provided testing set.
You need to submit the following:
(a) The completed rfdigits hw1.py file. (We will run this separately with different data.)
(b) A short description of how you implemented the training function for the randomized trees.
(c) A short description of what happens when you run the rfdigits hw1.py file; i.e., what are you
accuracies? How do the accuracies vary as you change the parameters of the tree (what if you use a
featureDepth of 10 or 100, how does this change the accuracy?).
In addition to these requirements, you are encouraged to explore the classifier and the problem further to, for
example, display images of digits that were misclassified or visualize the paths down a tree given the selected
type of feature.