Heart FailureDataset ML Algorithms
Heart FailureDataset ML Algorithms
In [4]: df = pd.read_csv("/Users/randyasfandy/Downloads/heart_failure_clinical_records_dataset.c
In [255… df.head(5)
In [256… df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 299 entries, 0 to 298
Data columns (total 13 columns):
# Column Non‑Null Count Dtype
‑‑‑ ‑‑‑‑‑‑ ‑‑‑‑‑‑‑‑‑‑‑‑‑‑ ‑‑‑‑‑
0 age 299 non‑null float64
1 anaemia 299 non‑null int64
2 creatinine_phosphokinase 299 non‑null int64
3 diabetes 299 non‑null int64
4 ejection_fraction 299 non‑null int64
5 high_blood_pressure 299 non‑null int64
6 platelets 299 non‑null float64
7 serum_creatinine 299 non‑null float64
8 serum_sodium 299 non‑null int64
9 sex 299 non‑null int64
10 smoking 299 non‑null int64
11 time 299 non‑null int64
12 DEATH_EVENT 299 non‑null int64
dtypes: float64(3), int64(10)
memory usage: 30.5 KB
In [578… df.sample(5)
In [579… df["high_blood_pressure"].value_counts()
0 194
Out[579]:
1 105
Name: high_blood_pressure, dtype: int64
In [580… df["smoking"].value_counts()
0 203
Out[580]:
1 96
Name: smoking, dtype: int64
In [94]: sns.histplot(creat)
<AxesSubplot:xlabel='creatinine_phosphokinase', ylabel='Count'>
Out[94]:
Classification Algorithms
Random Forest Classifier
Logistic Regression
KNN
Decision Tree
SVM
Naive Bayes
accuracy: 0.73
precision: 0.70
recall: 0.64
f1_score: 0.67
Logistic Regression
In [262… from sklearn.linear_model import LogisticRegression
KNN
In [208… from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
In [266… Ks = 10
mean_acc = np.zeros((Ks‑1))
std_acc = np.zeros((Ks‑1))
for n in range(1,Ks):
std_acc[n‑1]=np.std(yhat==y_test)/np.sqrt(yhat.shape[0])
mean_acc
In [267… print( "The best accuracy was with", mean_acc.max(), "with k=", mean_acc.argmax()+1)
accuracy: 0.57
precision: 0.40
recall: 0.08
f1_score: 0.13
Decision Tree
In [271… from sklearn.tree import DecisionTreeClassifier
In [272… mean_acc = np.zeros((9))
In [273… print( "The best accuracy was with", mean_acc.max(), "with depth=", mean_acc.argmax()+1)
The best accuracy was with 0.75 with depth= 1
accuracy: 0.75
precision: 0.81
recall: 0.52
f1_score: 0.63
SVM
In [279… from sklearn import svm
from sklearn.metrics import f1_score
SVC(kernel='linear')
Out[282]:
In [283… yhat = clf.predict(x_test)
accuracy: 0.75
precision: 0.81
recall: 0.52
f1_score: 0.63
Naive Bayes
In [286… from sklearn.naive_bayes import GaussianNB
accuracy: 0.73
precision: 0.91
recall: 0.40
f1_score: 0.56
Prediction Dictionary
In [295… accuracy_list = [rf_accuracy, lr_accuracy, knn_accuracy, dt_accuracy, svm_accuracy, nb_a
precision_list = [rf_precision, lr_precision, knn_precision, dt_precision, svm_precision
recall_list = [rf_recall, lr_recall, knn_recall, dt_recall, svm_recall, nb_recall]
f1_list = [rf_f1, lr_f1, knn_f1, dt_f1,svm_f1, nb_f1]
columns = ['Random Forest', 'Logistic Regression', 'KNN', 'Decision Tree', 'Support Vect
index = ['Accuracy', 'Precision', 'Recall', 'F1 Score']
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5})
<AxesSubplot:>
Out[6]:
FEATURE SELECTION
In [605… from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split