Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Roll NO 2020

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Roll NO 2020 –EE-519

Project report
Title: Heart Disease Classification using Machine Learning
Introduction:
The purpose of this lab is to provide a step-by-step guide to perform heart disease classification using
machine learning techniques. We will use a dataset containing various attributes related to heart health
and build models to predict the presence or absence of heart disease. The lab will cover data
preprocessing, exploratory data analysis, model creation, and evaluation. The algorithms used include
artificial neural networks and binary classification techniques.

Prerequisites:
1. Basic knowledge of Python programming.
2. Familiarity with data manipulation using Pandas.
3. Understanding of machine learning concepts, including supervised learning and classification.

Lab Steps:
1. Importing Required Libraries:
 Import the necessary libraries, including sys, pandas, numpy, sklearn, matplotlib, and
keras.
 Check the versions of the imported libraries.

2. Loading and Exploring the Dataset:


 Read the heart disease dataset from a CSV file.
 Examine the shape of the dataset to determine the number of examples.
 Display a sample data point from the dataset.
 Remove missing data indicated by "?" and drop rows with NaN values.
 Print the shape and data types of the processed dataset.
 Explore the data characteristics using descriptive statistics and histograms.

3. Data Visualization:
 Visualize the frequency of heart disease for different age groups using a bar plot.
 Create a correlation matrix heatmap to analyze the relationships between variables.
 Plot a line graph showing the relationship between age and maximum heart rate achieved
(thalach).

4. Data Preparation:
 Separate the features (X) and the target variable (y) from the dataset.
 Standardize the features by subtracting the mean and dividing by the standard deviation.
 Split the data into training and testing sets using a 80:20 ratio.
 Convert the target variable into categorical labels using one-hot encoding.

5. Model Creation:
 Define a function to create a neural network model using Keras.
 Build a neural network model with two hidden layers and an output layer.
 Compile the model using appropriate loss function, optimizer, and metrics.
 Print a summary of the model architecture.

6. Model Training and Evaluation:


 Fit the model to the training data and validate it on the testing data.
 Monitor the training progress by visualizing model accuracy and loss over epochs.
 Evaluate the model's performance using accuracy, precision, recall, and F1-score.
 Repeat steps 5 and 6 for binary classification, predicting the presence or absence of heart
disease.

Code:
mport sys
import pandas as pd
import numpy as np
import sklearn
import matplotlib
import keras

print('Python: {}'.format(sys.version))
print('Pandas: {}'.format(pd.__version__))
print('Numpy: {}'.format(np.__version__))
print('Sklearn: {}'.format(sklearn.__version__))
print('Matplotlib: {}'.format(matplotlib.__version__))
print('Keras: {}'.format(keras.__version__))
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix
import seaborn as sns
# read the csv
cleveland = pd.read_csv('C:\\Users\\hp\\Videos\\New folder\\heart.csv')
# print the shape of the DataFrame, so we can see how many examples we have
print( 'Shape of DataFrame: {}'.format(cleveland.shape))
print (cleveland.loc[1])
# print the last twenty or so data points
cleveland.loc[280:]
# remove missing data (indicated with a "?")
data = cleveland[~cleveland.isin(['?'])]
data.loc[280:]
# drop rows with NaN values from DataFrame
data = data.dropna(axis=0)
data.loc[280:]
# print the shape and data type of the dataframe
print(data.shape)
print(data.dtypes)
# transform data to numeric to enable further analysis
data = data.apply(pd.to_numeric)
data.dtypes
# print data characteristics, usings pandas built-in describe() function
data.describe()
# plot histograms for each variable
data.hist(figsize = (12, 12))
plt.show()
pd.crosstab(data.age,data.target).plot(kind="bar",figsize=(20,6))
plt.title('Heart Disease Frequency for Ages')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()
plt.figure(figsize=(10,10))
sns.heatmap(data.corr(),annot=True,fmt='.1f')
plt.show()
import seaborn as sns
import matplotlib.pyplot as plt

age_unique = sorted(data.age.unique())
age_thalach_values = data.groupby('age')['thalach'].count().values
mean_thalach = []

for i, age in enumerate(age_unique):


mean_thalach.append(sum(data[data['age'] == age].thalach) / age_thalach_values[i])

plt.figure(figsize=(10, 5))
sns.lineplot(x=age_unique, y=mean_thalach, color='red', alpha=0.6)
plt.xlabel('Age', fontsize=15, color='blue')
plt.xticks(rotation=45)
plt.ylabel('Thalach', fontsize=15, color='blue')
plt.title('Age vs Thalach', fontsize=15, color='blue')
plt.grid()
plt.show()
X = np.array(data.drop(['target'], 1))
y = np.array(data['target'])
X[0]
mean = X.mean(axis=0)
X -= mean
std = X.std(axis=0)
X /= std
# create X and Y datasets for training
from sklearn import model_selection

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, stratify=y, random_state=42,


test_size = 0.2)
# convert the data to categorical labels
from keras.utils.np_utils import to_categorical

Y_train = to_categorical(y_train, num_classes=None)


Y_test = to_categorical(y_test, num_classes=None)
print (Y_train.shape)
print (Y_train[:10])
X_train[0]
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from keras.layers import Dropout
from keras import regularizers

# define a function to build the keras model


def create_model():
# create model
model = Sequential()
model.add(Dense(16, input_dim=13, kernel_initializer='normal',
kernel_regularizer=regularizers.l2(0.001), activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(8, kernel_initializer='normal', kernel_regularizer=regularizers.l2(0.001),
activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(2, activation='softmax'))

# compile model
adam = Adam(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
return model

model = create_model()
print(model.summary())
# fit the model to the training data
history=model.fit(X_train, Y_train, validation_data=(X_test, Y_test),epochs=50, batch_size=10)
import matplotlib.pyplot as plt
%matplotlib inline

# Model accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'])
plt.show()
# Model Losss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'])
plt.show()
# convert into binary classification problem - heart disease or no heart disease
Y_train_binary = y_train.copy()
Y_test_binary = y_test.copy()

Y_train_binary[Y_train_binary > 0] = 1
Y_test_binary[Y_test_binary > 0] = 1
print(Y_train_binary[:20])
# define a new keras model for binary classification
def create_binary_model():
# create model
model = Sequential()
model.add(Dense(16, input_dim=13, kernel_initializer='normal',
kernel_regularizer=regularizers.l2(0.001),activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(8, kernel_initializer='normal',
kernel_regularizer=regularizers.l2(0.001),activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(1, activation='sigmoid'))

# Compile model
adam = Adam(lr=0.001)
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
return model

binary_model = create_binary_model()

print(binary_model.summary())
# fit the binary model on the training data
history=binary_model.fit(X_train, Y_train_binary, validation_data=(X_test, Y_test_binary), epochs=50,
batch_size=10)
import matplotlib.pyplot as plt
%matplotlib inline

# Model accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'])
plt.show()
# Model Losss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'])
plt.show()
# generate classification report using predictions for categorical model
from sklearn.metrics import classification_report, accuracy_score
categorical_pred = np.argmax(model.predict(X_test), axis=1)
print('Results for Categorical Model')
print(accuracy_score(y_test, categorical_pred))
print(classification_report(y_test, categorical_pred))
# generate classification report using predictions for binary model
from sklearn.metrics import classification_report, accuracy_score
# generate classification report using predictions for binary model
binary_pred = np.round(binary_model.predict(X_test)).astype(int)
print('Results for Binary Model')
print(accuracy_score(Y_test_binary, binary_pred))
print(classification_report(Y_test_binary, binary_pred))
this is code which train,test on data set of heart deases how now i check results

7. Conclusion:
 Compare and analyze the results of the categorical and binary models.
 Summarize the performance metrics and discuss the effectiveness of the models.
 Highlight the importance of data preprocessing, feature selection, and model selection in
machine learning.

You might also like