2nd Review
2nd Review
2nd Review
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTING TECHNOLOGIES
18CSP107L - MINOR PROJECT
Cardiovascular disease refers to any critical condition that impacts the heart.
Because heart diseases can be life-threatening, researchers are focusing on
designing smart systems to accurately diagnose them based on electronic health data,
with the aid of machine learning algorithms. This work presents several machine
learning approaches for predicting heart diseases, using data of major health factors
from patients. This project will demonstrate four classification methods: K- Nearest
Neighbour (KNN), Support Vector Machine (SVM), Random Forest (RF), and
Naive Bayes (NB), to build the prediction models. Data pre-processing and feature
selection steps will be done before building the models. The models were evaluated
based on the accuracy, precision, recall, and F1-score (Machine learning evaluation
metric that measures a model's accuracy). The SVM model is projected to perform
best with 91.67% accuracy.
26-8-2023 2
Introduction
• Globally, Cardiovascular Disease (CVDs) is the primary cause of morbidity and
mortality, accounting for more than 70% of all fatalities. We’re going with Coronary
Artery Disease (Damage in heart’s major blood vessel), High Blood pressure, Cardiac
Arrest. According to the 2017, Global Burden of Disease research, cardiovascular disease is
responsible for about 43% of all fatalities. Common risk factors for heart disease in high-
income nations include lousy diet, cigarette use, excessive sugar consumption, and obesity or
excess body fat. However, low and middle-income nations also see a rise in chronic illness
prevalence.
• In addition, technologies such as Electrocardiograms and CT scans, critical for
diagnosing coronary heart disease (reduction of blood flow in heart), are sometimes too
costly and impractical for consumers. The reason mentioned above alone has resulted in
the deaths of 17 million people . Twenty five to thirty percent of firms annual medical
expenses were attributable to employees with cardiovascular disease.
• Therefore, early detection of heart disease is essential to lessen its physical and monetary
cost to people and institutions. According to the WHO estimate, the overall number of deaths
from CVDs would rise to 23.6 million by 2030, with heart disease and stroke being the
leading causes.
26-8-2023 3
Problem Statement
Create a predictive model employing advanced machine learning
methodologies to ascertain the presence or absence of cardiovascular
ailments (which includes Coronary Artery Disease (Damage in heart’s
major blood vessel), High Blood pressure and Cardiac Arrest ) within
patient cohorts, based on intricate clinical and biometric attributes. The
objective is to empower medical practitioners with a highly precise
diagnostic tool, enhancing prognostic capabilities, and optimizing healthcare
resource allocation.
26-8-2023 4
Motivation
• The main motivation of doing this research is to present a heart
disease prediction model for the prediction of occurrence of heart
disease. Further, this research work is aimed towards identifying the
best classification algorithm for identifying the possibility of heart
disease in a patient.
• This work is justified by performing a comparative study and analysis
using three classification algorithms namely Naïve Bayes, Decision
Tree, and Random Forest are used at different levels of evaluations.
• Although these are commonly used machine learning algorithms, the
heart disease prediction is a vital task involving highest possible
accuracy. Hence, the three algorithms are evaluated at numerous
levels and types of evaluation strategies.
• This will provide researchers and medical practitioners to establish a
better.
26-8-2023 5
Literature Review
The primary objective of this study was to classify heart disease using different models and a real-world
dataset. The k-modes clustering algorithm was applied to a dataset of patients with heart disease to
predict the presence of the disease. The dataset was preprocessed by converting the age attribute to
years and dividing it into bins of 5-year intervals, as well as dividing the diastolic and systolic blood
pressure data into bins of 10 intervals. The dataset was also split on the basis of gender to take into
account the unique characteristics and progression of heart disease in men and women.
The results indicated that the MLP(Multi-Layer Perceptron) model had the highest accuracy of 87.23%.
These findings demonstrate the potential of k-modes clustering to accurately predict heart disease and
suggest that the algorithm could be a valuable tool in the development of targeted diagnostic and
treatment strategies for the disease. The study utilized the Kaggle cardiovascular disease dataset with
70,000 instances, and all algorithms were implemented on Google Collab. The accuracies of all
algorithms were above 86% with the lowest accuracy of 86.37% given by decision trees and the
highest accuracy given by multilayer perceptron, as previously mentioned.
26-8-2023 6
Literature Review
A cardiovascular disease detection model has been developed using three ML classification modelling
techniques. This project predicts people with cardiovascular disease by extracting the patient medical
history that leads to a fatal heart disease from a dataset that includes patients’ medical history such as
chest pain, sugar level, blood pressure, etc. This Heart Disease detection system assists a patient based
on his/her clinical information of them been diagnosed with a previous heart disease. The algorithms
used in building the given model are Logistic regression, Random Forest Classifier and KNN. The
accuracy of our model is 87.5%. Use of more training data ensures the higher chances of the model to
accurately predict whether the given person has a heart disease or not. By using these, computer aided
techniques we can predict the patient fast and better and the cost can be reduced very much. There are a
number of medical databases that we can work on as these Machine learning techniques are better and
they can predict better than a human being which helps the patient as well as the doctors. Therefore, in
conclusion this project helps us predict the patients who are diagnosed with heart diseases by cleaning
the dataset and applying logistic regression and KNN to get an accuracy of an average of 87.5% on our
model which is better than the previous models having an accuracy of 85%. Also, it is concluded that
accuracy of KNN is highest between the three algorithms that we have used i.e. 88.52%.
26-8-2023 7
Literature Review
This work aims to predict the existence of heart disease in patients according to specific
health measurements.The paper demonstrated 4 classification mechanism to build the
prediction model. The data was collected and cleaned from any missing values and extreme
outliers. In addition, it was preprocessed to fit the model requirements, where it went into
different phases of visualizing the imbalances, obtaining the correlation matrix, using
dimensionality reduction techniques, and finally splitting using Hold-out. The model was
trained and tested for each machine learning algorithm. SVM algorithm with linear kernel
had the best results with a 91.67% accuracy, 92.31% precision, 88.89% recall, and F1 Score
of 90.56%. The algorithms used were able to extract the complex relations between the
symptoms and the disease. Machine learning algorithms can also be applied to other types of
diseases, especially with the generation of more accurate datasets in the medical field in the
future. This work can be enhanced by applying more extensive data analysis and trying
additional algorithms to reach the maximum possible accuracy.
26-8-2023 8
Existing System
• Heart disease is even being highlighted as a silent killer which leads to the death of a
person without obvious symptoms. The nature of the disease is the cause of growing
anxiety about the disease & its consequences.
• Hence continued efforts are being done to predict the possibility of this deadly disease in
prior. So that various tools & techniques are regularly being experimented with to suit
the present-day health needs. Electrocardiograms and CT scans are the
current existing systems.
• Machine Learning techniques can be a boon in this regard. Even though heart disease can
occur in different forms, there is a common set of core risk factors that influence whether
someone will ultimately be at risk for heart disease or not. By collecting the data from
various sources, classifying them under suitable headings & finally analysing to extract
the desired data we can conclude. This technique can be very well adapted to the do
the prediction of heart disease.
• As the well-known quote says “Prevention is better than cure”, early prediction & its
control can be helpful to prevent & decrease the death rates due to heart disease.
26-8-2023 9
Problem statement and Objectives
26-8-2023 10
Innovation Idea
The "Heart Disease Prediction using Machine Learning" project has several practical applications and potential
benefits, including:
i. Early Diagnosis: The project can assist in the early detection of heart disease, enabling timely interventions and
treatments to improve patient outcomes.
ii. Preventive Healthcare: By identifying individuals at risk, the project can support preventive measures such as
lifestyle changes and medication to reduce the risk of heart disease.
iii.Personalized Treatment: The project can provide tailored treatment recommendations based on individual risk
factors, optimizing healthcare delivery.
iv.Remote Monitoring: The real-time monitoring and telemedicine components facilitate remote patient care,
especially relevant in situations where physical visits to healthcare facilities are challenging.
v. Research and Insights: The project's data analysis can contribute to a better understanding of heart disease risk
factors and trends, assisting researchers in the field.
vi. Improved Decision Support: Healthcare providers can make more informed decisions with the support of
predictive models, improving patient care.
vii.Patient Empowerment: Patients can actively engage in their healthcare by monitoring their health data and
following personalized recommendations.
26-8-2023 12
Architecture Model
26-8-2023 13
Proposed Modules and Description
• We did Ensemble Learning for this Project which includes bagging and boosting.
• Merging Machine Learning models can be done in a variety of ways, but the most
common methods include bagging, boosting, and stacking.
• Bagging involves training multiple models on different subsets of the same data
set and then aggregating their predictions. This is often used with decision trees or
random forests to reduce variance and overfitting. Boosting involves training
multiple models sequentially, where each model attempts to correct the errors of
the previous one.
• Lastly, stacking involves training multiple models on the same or different data
sets and then using their predictions as inputs for another model. This is often
used with neural networks or linear models to increase accuracy and
generalization.
26-8-2023 14
Proposed Modules and Description
We used a single algorithm (linear regression) to learn all the components at once. But it's also
possible to use one algorithm for some of the components and another algorithm for the rest. We
are going with ADA Boost and Random Forest Classifier algorithm. This way we can always
choose the best algorithm for each component. To do this, we use one algorithm to fit the original
series and then the second algorithm to fit the residual series.
• In detail, the process is this:-
# 1. Train and predict with first model
model_1.fit(X_train_1, y_train)
y_pred_1 = model_1.predict(X_train)
# 2. Train and predict with second model on residuals
model_2.fit(X_train_2, y_train - y_pred_1)
y_pred_2 = model_2.predict(X_train_2)
# 3. Add to get overall predictions:-
y_pred = y_pred_1 + y_pred_2
26-8-2023 15
Attributes of Datasets
26-8-2023 16
Intermediate Results and Discussion
26-8-2023 17
Engineering Reference
Data mining plays a crucial role in heart disease prediction using
machine learning projects. It involves extracting valuable patterns,
insights, and knowledge from large datasets to aid in the accurate
prediction and early detection of heart disease.
The working of the system starts with the collection of data and
selecting the important attributes. Then the required data is
Pre-processed into the required format. The data is then divided into
two parts training and testing data. The algorithms are applied and
the model is trained using the training data. The accuracy of the
system is obtained by testing the system using the testing data.
26-8-2023 20
References
[1] Soni J, Ansari U, Sharma D & Soni S (2011). Predictive data mining for medical diagnosis: an overview of
heart disease prediction. International Journal of Computer Applications, 17(8), 43-8
[2] Dangare C S & Apte S S (2012). Improved study of heart disease prediction system using data mining
classification techniques. International Journal of Computer Applications, 47(10), 44-8.
[3] Ordonez C (2006). Association rule discovery with the train and test approach for heart disease prediction.
IEEE Transactions on Information Technology in Biomedicine, 10(2), 334-43.
[4] Shinde R, Arjun S, Patil P & Waghmare J (2015). An intelligent heart disease prediction system using k-
means clustering and Naïve Bayes algorithm. International Journal of Computer Science and Information
Technologies, 6(1), 637-9.
[5] Bashir S, Qamar U & Javed M Y (2014, November). An ensemble-based decision support framework for
intelligent heart disease diagnosis. In International Conference on Information Society (i-Society 2014) (pp.
259-64). IEEE. ICCRDA 2020 IOP Conf. Series: Materials Science and Engineering 1022 (2021) 012072 IOP
Publishing doi:10.1088/1757-899X/1022/1/012072 9
[6] Jee S H, Jang Y, Oh D J, Oh B H, Lee S H, Park S W & Yun Y D (2014). A coronary heart disease prediction
model: the Korean Heart Study. BMJ open, 4(5), e005025.
[7] Ganna A, Magnusson P K, Pedersen N L, de Faire U, Reilly M, Ärnlöv J & Ingelsson E (2013). Multilocus
genetic risk scores for coronary heart disease prediction. Arteriosclerosis, thrombosis, and vascular biology,
33(9), 2267-72.
[8] Jabbar M A, Deekshatulu B L & Chandra P (2013, March). Heart disease prediction using lazy associative
classification. In 2013 International MutliConference on Automation, Computing,Communication, Control and
Compressed Sensing (iMac4s) (pp. 40- 6). IEEE.
26-8-2023 21