vertopal.com_MSML_Project_1
vertopal.com_MSML_Project_1
Mounted at /content/drive
[5 rows x 28 columns]
find a suitable projection vector 𝑤 w and classify based on the projections. Here’s how to
Binary Classifiers: Original Features 2.1 Linear Discriminant Analysis (LDA) For LDA, we need to
implement LDA:
import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
as LDA
import matplotlib.pyplot as plt
# Fit LDA
lda = LDA()
lda.fit(X_train, y_train)
type1_errors.append(type1_error)
type2_errors.append(type2_error)
# Predictions
y_pred_tree = tree_clf.predict(X_test)
# Confusion Matrix
cm_tree = confusion_matrix(y_test, y_pred_tree)
type1_error_tree = cm_tree[1, 0] / 200 # Denied when Approved
type2_error_tree = cm_tree[0, 1] / 200 # Approved when Denied
Decision Tree - Type 1 Error Rate: 0.155, Type 2 Error Rate: 0.175
for k in k_values:
knn_clf = KNeighborsClassifier(n_neighbors=k)
knn_clf.fit(X_train, y_train)
y_pred_knn = knn_clf.predict(X_test)
type1_errors_knn.append(type1_error_knn)
type2_errors_knn.append(type2_error_knn)
# Fit SVM
svm_clf = SVC(C=1.0, kernel='rbf', random_state=42) # Use RBF kernel
svm_clf.fit(X_train, y_train)
# Predictions
y_pred_svm = svm_clf.predict(X_test)
# Confusion Matrix
cm_svm = confusion_matrix(y_test, y_pred_svm)
type1_error_svm = cm_svm[1, 0] / 200 # Denied when Approved
type2_error_svm = cm_svm[0, 1] / 200 # Approved when Denied
Step 3: Binary Classifiers: PCA Features Now, let’s apply PCA and use it to train the kNN and
SVM classifiers:
# Apply PCA
pca = PCA(n_components=5) # Change number of components as needed
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)
kNN with PCA - Type 1 Error Rate: 0.125, Type 2 Error Rate: 0.19
SVM with PCA - Type 1 Error Rate: 0.125, Type 2 Error Rate: 0.14
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Create a DataFrame
error_df = pd.DataFrame(results)
# Bar width
bar_width = 0.35
ax.set_ylabel('Error Rate')
ax.set_title('Error Rates for Different Classifiers')
ax.set_xticks(index + bar_width / 2)
ax.set_xticklabels(error_df['Classifier'])
ax.legend()
plt.grid()
plt.tight_layout()
plt.show()
import tarfile
import os