Roll NO 2020
Roll NO 2020
Roll NO 2020
Project report
Title: Heart Disease Classification using Machine Learning
Introduction:
The purpose of this lab is to provide a step-by-step guide to perform heart disease classification using
machine learning techniques. We will use a dataset containing various attributes related to heart health
and build models to predict the presence or absence of heart disease. The lab will cover data
preprocessing, exploratory data analysis, model creation, and evaluation. The algorithms used include
artificial neural networks and binary classification techniques.
Prerequisites:
1. Basic knowledge of Python programming.
2. Familiarity with data manipulation using Pandas.
3. Understanding of machine learning concepts, including supervised learning and classification.
Lab Steps:
1. Importing Required Libraries:
Import the necessary libraries, including sys, pandas, numpy, sklearn, matplotlib, and
keras.
Check the versions of the imported libraries.
3. Data Visualization:
Visualize the frequency of heart disease for different age groups using a bar plot.
Create a correlation matrix heatmap to analyze the relationships between variables.
Plot a line graph showing the relationship between age and maximum heart rate achieved
(thalach).
4. Data Preparation:
Separate the features (X) and the target variable (y) from the dataset.
Standardize the features by subtracting the mean and dividing by the standard deviation.
Split the data into training and testing sets using a 80:20 ratio.
Convert the target variable into categorical labels using one-hot encoding.
5. Model Creation:
Define a function to create a neural network model using Keras.
Build a neural network model with two hidden layers and an output layer.
Compile the model using appropriate loss function, optimizer, and metrics.
Print a summary of the model architecture.
Code:
mport sys
import pandas as pd
import numpy as np
import sklearn
import matplotlib
import keras
print('Python: {}'.format(sys.version))
print('Pandas: {}'.format(pd.__version__))
print('Numpy: {}'.format(np.__version__))
print('Sklearn: {}'.format(sklearn.__version__))
print('Matplotlib: {}'.format(matplotlib.__version__))
print('Keras: {}'.format(keras.__version__))
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix
import seaborn as sns
# read the csv
cleveland = pd.read_csv('C:\\Users\\hp\\Videos\\New folder\\heart.csv')
# print the shape of the DataFrame, so we can see how many examples we have
print( 'Shape of DataFrame: {}'.format(cleveland.shape))
print (cleveland.loc[1])
# print the last twenty or so data points
cleveland.loc[280:]
# remove missing data (indicated with a "?")
data = cleveland[~cleveland.isin(['?'])]
data.loc[280:]
# drop rows with NaN values from DataFrame
data = data.dropna(axis=0)
data.loc[280:]
# print the shape and data type of the dataframe
print(data.shape)
print(data.dtypes)
# transform data to numeric to enable further analysis
data = data.apply(pd.to_numeric)
data.dtypes
# print data characteristics, usings pandas built-in describe() function
data.describe()
# plot histograms for each variable
data.hist(figsize = (12, 12))
plt.show()
pd.crosstab(data.age,data.target).plot(kind="bar",figsize=(20,6))
plt.title('Heart Disease Frequency for Ages')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()
plt.figure(figsize=(10,10))
sns.heatmap(data.corr(),annot=True,fmt='.1f')
plt.show()
import seaborn as sns
import matplotlib.pyplot as plt
age_unique = sorted(data.age.unique())
age_thalach_values = data.groupby('age')['thalach'].count().values
mean_thalach = []
plt.figure(figsize=(10, 5))
sns.lineplot(x=age_unique, y=mean_thalach, color='red', alpha=0.6)
plt.xlabel('Age', fontsize=15, color='blue')
plt.xticks(rotation=45)
plt.ylabel('Thalach', fontsize=15, color='blue')
plt.title('Age vs Thalach', fontsize=15, color='blue')
plt.grid()
plt.show()
X = np.array(data.drop(['target'], 1))
y = np.array(data['target'])
X[0]
mean = X.mean(axis=0)
X -= mean
std = X.std(axis=0)
X /= std
# create X and Y datasets for training
from sklearn import model_selection
# compile model
adam = Adam(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
return model
model = create_model()
print(model.summary())
# fit the model to the training data
history=model.fit(X_train, Y_train, validation_data=(X_test, Y_test),epochs=50, batch_size=10)
import matplotlib.pyplot as plt
%matplotlib inline
# Model accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'])
plt.show()
# Model Losss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'])
plt.show()
# convert into binary classification problem - heart disease or no heart disease
Y_train_binary = y_train.copy()
Y_test_binary = y_test.copy()
Y_train_binary[Y_train_binary > 0] = 1
Y_test_binary[Y_test_binary > 0] = 1
print(Y_train_binary[:20])
# define a new keras model for binary classification
def create_binary_model():
# create model
model = Sequential()
model.add(Dense(16, input_dim=13, kernel_initializer='normal',
kernel_regularizer=regularizers.l2(0.001),activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(8, kernel_initializer='normal',
kernel_regularizer=regularizers.l2(0.001),activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(1, activation='sigmoid'))
# Compile model
adam = Adam(lr=0.001)
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
return model
binary_model = create_binary_model()
print(binary_model.summary())
# fit the binary model on the training data
history=binary_model.fit(X_train, Y_train_binary, validation_data=(X_test, Y_test_binary), epochs=50,
batch_size=10)
import matplotlib.pyplot as plt
%matplotlib inline
# Model accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'])
plt.show()
# Model Losss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'])
plt.show()
# generate classification report using predictions for categorical model
from sklearn.metrics import classification_report, accuracy_score
categorical_pred = np.argmax(model.predict(X_test), axis=1)
print('Results for Categorical Model')
print(accuracy_score(y_test, categorical_pred))
print(classification_report(y_test, categorical_pred))
# generate classification report using predictions for binary model
from sklearn.metrics import classification_report, accuracy_score
# generate classification report using predictions for binary model
binary_pred = np.round(binary_model.predict(X_test)).astype(int)
print('Results for Binary Model')
print(accuracy_score(Y_test_binary, binary_pred))
print(classification_report(Y_test_binary, binary_pred))
this is code which train,test on data set of heart deases how now i check results
7. Conclusion:
Compare and analyze the results of the categorical and binary models.
Summarize the performance metrics and discuss the effectiveness of the models.
Highlight the importance of data preprocessing, feature selection, and model selection in
machine learning.