Predictions in Heart Disease Using Techniques of Data Mining

2015 1st International Conference on Futuristic trend in Computational Analysis and Knowledge Management
(ABLAZE-2015)
Predictions in Heart Disease Using

Techniques of Data Mining
Monika Gandhi Dr. Shailendra Narayan Singh
Computer Science and Engineering Department Computer Science and Engineering Department
ASET, Amity University, Noida, India ASET, Amity University, Noida, India
mgandhi9404@gmail.com snsingh36@amity.edu
sns2033@gmail.com
Abstract: As huge amount of information is produced A major challenge confronting healthcare

in medical associations (healing facilities, therapeutic associations i.e. hospitals, medicinal focuses are the
focuses) yet this information is not properly utilized. procurement of quality services at reasonable
The health care system is "data rich" however
expenses. Quality services suggest diagnosing
"knowledge poor ". There is an absence of successful
patients accurately and overseeing medicines that
analysis methods to find connections and patterns in
health care data. Data mining methods can help as
are more effective. Poor clinical decisions can
remedy in this circumstance. For this reason, prompt to poor outcomes which are therefore
different data mining techniques can be utilized. The unsatisfactory. Healthcare organizations can reduce
paper intends to give details about various techniques costs by accomplishment of computer based data
of knowledge abstraction by using data mining and/or decision support systems. Healthcare
methods that are being used in today's research for services data is very huge as it incorporates patient
prediction of heart disease. In this paper, data mining records, resource management information and
methods namely, Naive Bayes, Neural network,
updated information. Human services associations
Decision tree algorithm are analyzed on medical data
must have capacity to break down information.
sets using algorithms.
Treatment records of many patients can be stored
Keywords - Data mining, Heart disease, classification, away in computerized way; furthermore data
prediction, Decision tree, Neural network, Naïve Bayes. mining methods may help in finding out a few vital
and basic inquiries related with healthcare
organizations.
I. INTRODUCTION
Clinical choices are frequently made focused
The headway of information technology, around doctors’ instinct and experience instead of
framework coordination and additionally on the knowledge rich information covered up in
programming advancement, systems have formed the database. This practice prompts undesirable
an imaginative era of multifaceted computer biases, blunders and unnecessary medicinal
framework. Information technology specialists expenses which influence the quality of services
have been offered few challenges by these given to the patients. Wu, et al proposed that
frameworks. A case of such framework is the combination of clinical choice backing with
healthcare services framework. Recently, there has computer based patient records could decrease
been an augmented attention to make utilization of medical errors, enhance safety of patients,
the headway of data mining advances in healthcare lessening undesirable practice variety, and enhance
frameworks. Thus, the target of the present effort is patient outcome [4]. This suggestion is
to find out the aspects of use of healthcare data for guaranteeing as the data demonstrating and analysis
aid of people by method of machine learning tools for example data mining, have the possibility
furthermore data mining procedures. The main aim to create a knowledge rich environment which can
is to suggest an automated system for diagnosing help to essentially enhance the nature of clinical
heart diseases by taking into account earlier decisions.
information and data.
Data mining is an important step of KDD i.e.
knowledge discovery from database. KDD
978-1-4799-8433-6/15/$31.00©2015 IEEE
520
(ABLAZE-2015)
comprises of an iterative sequence of data cleaning, Heart illness is a wide term that incorporates
data integration, data choice, data mining pattern different sorts of sicknesses influencing diverse
recognition furthermore data presentation. In segments of the heart. Heart signifies "cardio."
particulars, data mining may be accomplished Therefore, all heart sicknesses fit in with the class
using classification, clustering, prediction, of cardiovascular ailments.
association and time series analysis.
A few sorts of Heart illnesses are
II. HEART DISEASE
a). Coronary illness: It otherwise called coronary
Heart is vital part or an organ of the body. Life is supply route malady (CAD), it is the most well-
subject to proficient working of heart. In the event known kind of coronary illness over the world. It is
that operation of heart is not proper, it will a condition in which plaque stores obstruct the
influence the other body parts of human, for coronary veins prompting a lessened supply of
example, mind, kidney, etc. Heart is simply a blood and oxygen to the heart.
pump, which pumps the blood through the body. In
the event that if blood in body is insufficient then b) Angina pectoris: It is a therapeutic term for
many organs like cerebrum suffer and if heart quits midsection torment that happens because of
working by, death happens inside minutes. Life is deficient supply of blood to the heart. Otherwise
totally subject to effective working of the heart. called angina, it is a cautioning sign for heart
The term Heart sickness alludes to illness of heart assault. The midsection torment is at interims
& vessel framework inside it. running for few seconds or minutes.
There are number of elements which build the c). Congestive heart disappointment: It is a
danger of Heart infection: condition where the heart can't pump enough blood
• family history of coronary illness to whatever is left of the body. It is generally
• smoking known as heart disappointment.
• Poor eating methodology d). Cardiomyopathy: It is the debilitating of the
• high pulse heart muscle or a change in the structure of the
• cholesterol muscle because of lacking heart pumping. A
• high blood cholesterol portion of the normal reasons for Cardiomyopathy
• obesity are hypertension, liquor utilization, viral diseases,
• Physical inertia and hereditary imperfections.
Symptoms of a Heart Attack e). Innate coronary illness: It alludes to the

development of an irregular heart because of a
Manifestations of a heart assault can include: deformity in the structure of the heart or its
• Discomfort, weight, largeness, or agony working. It is additionally a sort of innate ailment
in the midsection, arm, or beneath the that kids are conceived with.
breastbone.
f). Arrhythmias: It is connected with an issue in the
• Discomfort emanating to the back, jaw,
musical development of the pulse. The pulse can be
throat, or arm.
abating, quick, or unpredictable. These unusual
• Fullness, heartburn, or stifling feeling
heartbeats are brought about by a short out in the
(may feel like indigestion).
heart's electrical framework.
• Sweating, queasiness, heaving, or
unsteadiness. g). Myocarditis: It is an aggravation of the heart
• Extreme shortcoming, nervousness, or muscle normally brought on by popular, parasitic,
shortness of breath. and bacterial contaminations influencing the heart.
• Rapid or not regular heart beats It is an exceptional malady with few indications
like joins agony, leg swelling or fever that can't be
Types of heart Disease
521
2015 1st International Conference onn Futuristic trend in Computational Analysis and Know
wledge Management
(ABLAZE-2015)
specifically identified with the heart. C. Naive Bayes Classiffier

A. Decision trees :
The decision tree approachh is one of the most

powerful techniques in classification
c in data
mining. It builds the models in the form of tree
structure. Mainly, dataset brreaks in small sets and
concurrently, an associated decision
d tree is formed.
Decision trees can handle booth numerical data and
categorical data. For mediical purpose, decision
trees determine order in diffferent attributes and a
Fig 1: A Bar chart showing perrcent of deaths decision is then taken based on
o the attribute.
caused due to different types of hearrt diseases.
There are various decision tree
t algorithms that are
III. DATA MINING IN PRE
EDICTION OF used. Most preferred algorithhm is ID3 i.e. Iterative
HEART DISEA
ASE Dichotomized 3 by J. R. R Quinlan. ID3 uses
information gain and entropyy to classify data in tree
Despite the fact that data mining has
h been around
structure.
for more than one decade, its potenntial is just been
realized now. Data mining joins factual Iterative Dichotomized 3 Alggorithm: The algorithm
examination, machine learning and database produces decision trees usingg Shannon Entropy.
engineering to extract hidden patterns and
connections from substantial databases. Fayyad Steps:
characterizes data mining as "aa procedure of
nontrivial extraction of implied, laastly not known a) Build Classification Attribbute (from the table).
and possibly valuable data from the information
b) Compute Classification Enntropy
that is stored in a database" [10]. Giudici
characterizes data mining as "aa procedure of n
choice, investigation and demonsstrating of vast H(X) = -∑ p(xi) logb p(xi)
amounts of information to find regularities or i
i=1
relations with the point of getting clear
c and helpful
results for the manager of database". Where, X is current data set for which entropy can
be calculated, n is set of cllasses in X and p(x) is
Data mining mainly uses two methodologies: proportion of the number off elements in class n to
supervised and unsupervised learning. In the number of elements in set X.
supervised learning, a training seet is utilized to
learn model parameters though in i unsupervised Entropy is figured for each one remaining quality.
learning no training set is utilized like in k-means The property with the littlesst entropy is utilized to
clustering. The two most normal moodeling goals of part the set on this iteraation. The higher the
data mining are classification and a prediction. entropy, the higher the possibility
p to enhance
Classification models classify disccrete, unordered classification.
values or data whereas prediction models predicts
about continuous valued. Decision trees
t and Neural
c) For each one attribute in taable, compute
Networks are examples of classiffication models
while Regression, Association Rules and Clustering Information Gain utilizing cllassification attribute.
are examples of prediction algorithmm [2]. IG(A,X) = H(X) - ∑p(t)H(t)
∑
t T
In this prediction of heart disease, we will use the
following classification models of data mining are
analyzed: Where H(X) - Entropy of sett X, T is subset created
from splitting set X, p(t) is proportion
p of elements
A. Decision trees
B. Neural networks in t to the number of elementts in X.
522
2015 1st International Conference onn Futuristic trend in Computational Analysis and Know
wledge Management
(ABLAZE-2015)
more hidden layers and finallyfi output layer is

The attribute which have highest infformation gain
generated. Mostly, ANN connsists of ‘Learning rule’
is used to split the set X on particulaar iteration. that modify the weight of connections. Learning in
neutral net can be of bothh types i.e. supervised
d) Select Attribute with the highest gain to
learning and unsupervised learning. An artificial
following Node in the tree (beginnning from the
neural network consists of three layers i.e. input
Root hub).
layer, hidden layer and outpput layer. The principal
e) Remove Node Attribute, making decreased table. layer is the input layer and laast layer is output layer.
Between input and output layyer, there may be extra
f) Repeat steps 3-5 until all attribbutes have been layer i.e. hidden layer. a neuural network can easily
utilized, or the same classification values stays in be trained to perform different functions by
rows of reduced table. At that point, smallest tree is adjustments in values of weigght among elements
preferred.
Id3 attempt in making short decisioon tree out of set

of learning data, shortest is not geenerally the best
classification. Due to limitation, it is succeeded by
Quinlan's C4.5 and C5.0 calculationns.
Advantages:
• easy to understand,, interpret

• Rules are easily gennerated.
• Implicit perform feeature selection
Fig1: Basic Model off an artificial neural
• Allows addition off number of new
network with four inputss and one output.
data.
There are different classes of network architectures
Disadvantages:
i.e. single layer feed forwarrd network, multi layer
• Can suffer from over
o fitting. feed forward network, recurrrent network. Artificial
• Non numeric datta is difficult to neural networks make a usefful tool to help doctors
handle. to analyze, model complex clinical
c data. Most Das,
Turkoglu and Sengurn constructed
c a neural
• Tress with manny branches is
networks based methodologyy for diagnosing of the
difficult to undersstand.
heart disease. (7)
• Time consuming.
Advantages:
B. Neural networks:
• Neural networks cann easily handle missing
An artificial neural network is information or noise data.
processing method encouraged by biological
• Once trained, does not
n need to reprogram.
nervous system. Dr. Robert Heccht-Nielsen. He
• It can easily work with large number of
defines a neural network as:"a com mputing system
datasets.
made up of a number of simple, highly
interconnected processing elementss, which process Disadvantages:
information by their dynamic staate response to
external inputs”.(In "Neural Netwoork Primer: Part • Neural network neeeds training to operate
I" by Maureen Caudill, AI Expert, Feb.
F 1989) well.
• me is required for large
High processing tim
Neural network is organized into number
n of layers networks.
consisting of huge number of eleements that are
• Neural networks cannnot be retrained i.e. if
highly interconnected i.e. neuronss that have an
there is any modification in data, it is
activation function. Different patternns are generated
with input layer that communicattes with one or
523
(ABLAZE-2015)
almost impossible to add to an existing P (Ci |X) = P(X | Ci)* P(Ci)

network. P (X)
C. Naive Bayes Classifier: 3. Because P(X) is constant in all classes,

therefore P (X|Ci)* P (Ci) need be
A Naive Bayes classifier is a simple maximized.
probabilistic classifier that depends on Bayes' 4. As then assumption of class conditional
theorem with strong i.e. independence is done. Therefore it is pre
naive independence assumptions. It is also be assumed that value of attributes are
called as "independent feature model". In general conditionally independent of each other.
terms, a naive Bayes classifier assumes that the Thus,
presence (or absence) of a particular feature of a P(X| Ci) = Пm P(xk| Ci)
class is unrelated to the presence (or absence) of K=1
any other feature. Naive Bayes classifiers are = P(x1| Ci)* P(x2| Ci )......P(xm| Ci)
trained to work in supervised learning.
5. To predict class of X, P(X|Ci )P(Ci ) is
Naive Bayes classifier mainly pre assumes the calculated for each class Ci. Naive Bayes
effect of a variable value on predefined class that is predict that class label of X is Ci class if
not dependent on value of other variable. This is P(X|Ci)P(Ci) > P(X|Cj)P(Cj)
called as property of class conditional for 1 ≤ j ≤ m, j ≠ i
independence. It is particularly suited when the
dimensionality of the inputs is high. Naïve
Bayesian is mainly used to form models with In medical DM, Naïve Bayes classifier plays a
predictive capabilities. crucial role. It shows high performance as if
attributes are not dependent on one other, one can
Bayes' Theorem: easily use it in medical diagnosis. As in medical
data, there are missing values and this classifier can
Probability (B given A) = (Probability (A and B)/
easily handle missing values. [9]
Probability (A))
Advantage:
Assume X as a data tuple. Let H be any hypothesis.
P (H|X) be posterior probability of the H that is • Easy handle of large amount of data.
conditioned on X. In the same way, P (X|H) is the • It mainly require small amount of training set
posterior probability of X condition on H. to estimate the parameters i.e. mean and
variance needed for classification.
P (H|X) = (P(X|H)P(H)/ P (X))
• Fast to train and Fast to classify
P(H) is prior probability of H.
• Not sensitive to irrelevant features
Naive Bayes Algorithm: • Handles real and discrete data
• Handles streaming data well
1. Assume D to be training set of tuple. Every
record can be represented by n-dimensional Disadvantages:
attribute vector i.e. X=(x1, x2…, xn),predicting
n measurements on tuple from n attributes, i.e. • Loss of accuracy
A1 to An. • Practically, there are dependencies among
2. Let m number of class for prediction (C1, variables, but these dependencies are not
C2…...., Cm). As for record X, the classifier handled by the classifier.
predict that X will belong to the class with • Assumes independence of features
maximum posterior probability that is
conditioned on X. Naïve Bayes predict that the IV. CONCLUSION
tuple x will belong to class Ci only if P
(Ci|X)>P (Cj|X). Therefore we have to In this paper we studied how data mining
maximize P(Ci|X). By Bayes’ theorem: techniques brings with set of techniques to find out
524
(ABLAZE-2015)
hidden patterns for making decision in healthcare Technology and Exploring Engineering (IJITEE),
organizations. We focussed on classification September 2012.
methods of data mining used in data discovery.
[4] K.Srinivas B.Kavihta Rani Dr. A.Govrdhan,
Different classification techniques of data mining
“Applications of Data Mining Techniques in
have merits and demerits for data classification and Healthcare and Prediction of Heart Attacks”,
knowledge extraction. (IJCSE) International Journal on Computer Science
and Engineering, 2010.
Furthermore, neural networks, decision trees or
naïve Bayes can be studied in more detail to [5] Deepali Chandna, “Diagnosis of Heart Disease
implement an algorithm that is helpful in healthcare Using Data Mining Algorithm”, 1678-1680,
organizations. (IJCSIT) International Journal of Computer
Science and Information Technologies, Vol. 5 (2) ,
REFERENCES 2014.
[1] K Raj Mohan, Ilango Paramasivam ,Subhashini,

SathyaNarayan “ Prediction and Diagnosis of [6] K.Sudhakar, Dr. M. Manimekalai, “ Study of
Heart Disease Prediction using Data Mining”,
Cardio Vascular Disease – A Critical Survey”,
International Journal of Advanced Research in
2014 World Congress on Computing and Computer Science and Software Engineering
Communication Technologies. Volume 4, Issue 1, January 2014.
[2] Ms. Priti V. Wadal, Dr. S. R. Gupta, “Predictive

Data Mining For Medical Diagnosis: An Overview [7] Qeethara Kadhim Al-Shayea, “Artificial Neural
Of Heart Disease Prediction “International Networks in Medical Diagnosis”, IJCSI
Conference on Industrial Automation and International Journal of Computer Science Issues,
Vol. 8, Issue 2, March 2011.
Computing (ICIAC- 12-13th April 2014)
[8]Ms.Rupali R.Patil,” Heart Disease Prediction
[3] Aqueel Ahmed, Shaikh Abdul Hannan, “Data
System using Naive Bayes and Jelinek-mercer
Mining Techniques to Find Out Heart Diseases: An
smoothing”, International Journal of Advanced
Overview”, International Journal of Innovative
Research in Computer and Communication
Engineering Vol. 3, Issue 5, May 2014.
525

Predictions in Heart Disease Using Techniques of Data Mining

Uploaded by

Copyright:

Available Formats

Predictions in Heart Disease Using Techniques of Data Mining

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predictions in Heart Disease Using Techniques of Data Mining

Uploaded by

Copyright:

Available Formats

2015 1st International Conference on Futuristic trend in Computational Analysis and Knowledge Management

Predictions in Heart Disease Using

Abstract: As huge amount of information is produced A major challenge confronting healthcare

Symptoms of a Heart Attack e). Innate coronary illness: It alludes to the

specifically identified with the heart. C. Naive Bayes Classiffier

The decision tree approachh is one of the most

more hidden layers and finallyfi output layer is

Id3 attempt in making short decisioon tree out of set

• easy to understand,, interpret

almost impossible to add to an existing P (Ci |X) = P(X | Ci)* P(Ci)

C. Naive Bayes Classifier: 3. Because P(X) is constant in all classes,

[1] K Raj Mohan, Ilango Paramasivam ,Subhashini,

[2] Ms. Priti V. Wadal, Dr. S. R. Gupta, “Predictive

You might also like