CSC 309 Project 10 K-Nearest Neighbors With Scikit-Learn
CSC 309 Project 10 K-Nearest Neighbors With Scikit-Learn
CSC 309 Project 10 K-Nearest Neighbors With Scikit-Learn
For this project you will implement a Machine Learning Classifier, called K-Nearest
Neighbors and use it on the Iris dataset.
For more information on the Iris data set, please see the Wikipedia page.
https://en.wikipedia.org/wiki/Iris_flower_data_set
Your program will read training (x1, y1) and test (x2) data sets from files provided on the
website, run KNN, and predict the labels for test set.
http://unixlab.sfsu.edu/~ats/csc309s18/projects/p10/
At that link, you will find three files: x1, y1, and x2.
Your program must train your KNN classifier with x1 and y1, then it must predict the
values for y2, when given x2. I will then compare your answers with the actual answers,
for correctness.
Your program will read in each file’s data into numpy arrays. Hint: Use np.genfromtxt().
https://docs.scipy.org/doc/numpy/reference/routines.io.html
Then, using the those numpy arrays, you simply need to run a similar example to the one
we went over in the class slides for Scikit-learn but instead of loading the iris data from
the datasets module, you will be reading in the data from the files provided on the class
website as previously mentioned.
# Example from class, using ScikitLearn
from sklearn import neighbors, datasets
from sklearn.model_selection import train_test_split
# load the iris dataset into a variable
iris = datasets.load_iris()
k = 15
# split the dataset into, x_train, x_test, y_train, y_test
x1, x2, y1, y2= train_test_split(iris.data, iris.target,
test_size=0.1)
# Instantiate KNN classifier
clf = neighbors.KNeighborsClassifier(k)
# Fit the classifier with training data
clf.fit(x1, y1)
# Print the predicted and actual labels
print("Predicted labels: ", clf.predict(x2))
print("Actual labels: ", y2)
SAMPLE OUTPUT:
Predicted labels: [2 2 2 2 2 0 2 0 0 0 0 1 0 0 1]
Actual labels: [2 2 2 2 2 0 2 0 0 0 0 1 0 0 1]
If you do not have scikit-learn, try running: “sudo pip3 install scikit-learn”
This project is intended to be very easy and expose you to your first Machine
Learning application. You are encouraged to try different values for K, inspect
variables, and to try other classifiers.
Submission: Submit two files: your main Python program and an output.txt file that
contains your y2 answers to the iLearn submission link. You can use python functions or
copy and paste to create the output.txt file. 90 points for correctness, 10 points for
headers, documentation and clarity.