Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

08250771

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

International Conference on Intelligent Computing and Control Systems

ICICCS 2017

Predictive Analytics in Health Care


Using Machine Learning Tools and Techniques
B. Nithya Dr. V. Ilango
Senior Assistant Professor, Research Scholar Professor, Head of Department
Department of MCA, New Horizon College of Engineering Department of MCA, New Horizon College of Engineering
Bangalore, India Bangalore, India
nithya.boopalan@gmail.com banalysist@yahoo.com

Abstract - When we have a huge data set on which we would II. PREDICTIVE ANALYTICS
like to perform predictive analysis or pattern recognition,
machine learning is the way to go. Machine Learning (ML) is the
Predictive Analytics is the branch of the advanced
fastest rising arena in computer science, and health informatics is analytics which is used to make predictions about unknown
of extreme challenge. The aim of Machine Learning is to develop future events. Predictive analytics applies many techniques
algorithms which can learn and progress over time and can be from data mining, statistics, modelling, machine learning, and
used for predictions. Machine Learning practices are widely used artificial intelligence to investigate current findings to make
in various fields and primarily health care industry has been predictions about future. The predictor is the main entity in
benefitted a lot through machine learning prediction techniques. predictive analytics which is defined as a variable used for the
It offers a variety of alerting and risk management decision measurement of future behaviour. With the help of the
support tools, targeted at improving patients’ safety and predictors upcoming probabilities are forecasted with highly
healthcare quality. With the need to reduce healthcare costs and reliable results. The approaches used to conduct predictive
the movement towards personalized healthcare, the healthcare analytics can be classified into machine learning techniques
industry faces challenges in the essential areas like, electronic and regression techniques. Machine learning techniques have
record management, data integration, and computer aided become progressively popular in conducting predictive
diagnoses and disease predictions. Machine Learning offers a analytics due to their outstanding performance in manage
wide range of tools, techniques, and frameworks to address these large scale datasets with uniform characteristics and noisy
challenges. This paper depicts the study on various prediction data. Observational studies show that machine learning is
techniques and tools for Machine Learning in practice. A glimpse appropriate to build predictive models by extracting patterns
on the applications of Machine Learning in various domains are
from large datasets. These models are widely used in
also discussed here by highlighting on its prominence role in
predictive data analytics applications like price prediction, risk
health care industry.
assessment, predicting customer behaviour, and document
Keywords - Predictive Analytics; Machine Learning; Health classification.
Care; Prediction Algorithms and Techniques
III. PREDICTIVE ANALYTICS IN HEALTH CARE
I. INTRODUCTION Predictive Analytics is supporting different segments of
The Industrial Organizations are working with huge health care life sciences and providers. It aims in diagnosing
amounts of data that have to be recognized with Machine the diseases accurately, enhancement of patient care, resource
Learning. By gathering insights from these data, organizations optimization and also improves clinical outcomes. Predictive
are able to work more efficiently as well as gain advantage Analytics helps organizations to prepare for the health care by
over their competitors. Innovative predictive models have optimizing the cost [5]. The accomplishment of predictive
been applied successfully with Machine Learning algorithms analytics in this industry is likely to provide proficient
in several domains. Machine learning techniques and its outcome by improving the service quality. Predictive
applications are in usage in day to day activities, such us Analytics have the future to transform the health care industry.
searching, advertisements’, YouTube. Health Care
informatics, a multi-disciplinary field has become IV. MACHINE LEARNING
synonymous with the technological advancements and data
Machine Learning: the classic definition is - A computer
handling challenges. Medical or Health Informatics is a
program is said to learn from experience E with respect to
scientific field that deals with the storage, retrieval, and
some class of tasks T and performance measure P, if its
optimal use of medical information, data, and provides
knowledge for problem solving and decision making. performance at tasks in T, as measured by P, improves with
Technology in Health has an immense development over the experience E [27]. Machine Learning is a division of artificial
years like advances in information gathering, treatments, intelligence that practices a variety of statistical, probabilistic
communications and research. and optimization techniques that allows computers to learn
from prior examples and to detect hard-to-discern patterns

978-1-5386-2745-7/17/$31.00 ©2017 IEEE 492


International Conference on Intelligent Computing and Control Systems
ICICCS 2017

from huge, noisy or complex data sets. Machine Learning is a


method of data analysis that automates analytical model
building. Through procedures that iteratively learn from data,
machine learning allows computers to find hidden insights
without being explicitly programmed where to look.

A. Machine Learning’s Role in Predictions Fig.1. Machine Learning Process


Enterprises are motivated to find greater meaning in the
substantial amounts of data they generate and save every day. After these steps have been accomplished, if the model
Machine learning is provided with effective algorithms, appears to be performing acceptably, it can be deployed for its
applications, and frameworks to attain greater predictive intended task. The model may be applied to provide score data
accuracy and value to enterprises’ data sets and contributing to for predictions, for projections of financial data, to generate
diverse strategies succeeding. Machine learning methodologies suitable insight for marketing or research, or to automate
are intended to find out the chance to optimize the decisions, tasks. The successes and failures of the deployed model might
based on the predictive value of large-scale data sets. It is even provide additional data to train the next generation
evidencing to be effective at handling predictive tasks model.
including defining which behaviors have the maximum
tendency to drive preferred outcomes. V. MACHINE LEARNING ALGORITHMS
There are many algorithms available in machine learning
B. Steps to apply machine learning to data and they are classified into three broad categories, depending
on the nature of the learning.
Machine learning task can be broken into below steps.
The process of selecting a machine learning algorithm
1) Collecting data: Whether the data is written on paper,
involves matching the features of the data to be learned to the
recorded in text files and spreadsheets, or stored in an SQL biases of the existing approaches. Machine learning
database, the data need to be gathered in an electronic format algorithms can be divided into two key groups: supervised
suitable for analysis. This data will serve as the learning learners which are used to construct predictive models, and
material an algorithm uses to generate actionable information. unsupervised learners that are used to build descriptive
2) Exploring and preparing the data: The quality of any models.
machine learning project is based largely on the quality of data
it uses. This stage in the machine learning process tends to
require a great deal of human intervention. An often-cited
statistic suggests that 80 percent of the effort in machine
learning is dedicated to data. Much of this time is spent
learning more about the data and its nuances throughout a
practice called data exploration. Fig. 2. Types of Machine Learning Process

3) Training a model on the data: The specific machine A. The Supervised Learning/Predictive Models
learning task will inform the selection of an appropriate A predictive model is used for tasks that include the
algorithm, and the algorithm will represent the data in the prediction of one value using other values in the dataset. As
form of a model. predictive models are given clear instruction on what they need
4) Evaluating model performance: It is very important to to learn and how they are intended to learn it, the process of
estimate how well the algorithm learned from its past training a predictive model is identified as supervised learning.
experience, since each machine learning model results in a The aim of supervised, machine learning is to build a model
biased solution to the learning problem. The accuracy of the that makes predictions based on evidence in the existence of
model can be evaluated using a test dataset, depending on the uncertainty. Exactly, a supervised learning algorithm takes a
type of model used known set of input data and known responses to the data
(output), and trains a model to generate realistic predictions for
5) Improving model performance: It is necessary to the response to new data.
utilize the advanced strategies to augment the performance of
the model, if better performance is needed. Every now and
then, it may be required to change to a different type of model
overall.

Fig. 3. Supervised Learning

978-1-5386-2745-7/17/$31.00 ©2017 IEEE 493


International Conference on Intelligent Computing and Control Systems
ICICCS 2017

TABLE I. SUPERVISED AND UNSUPERVISED LEARNING ALGORITHMS B. Unsupervised Learning / Descriptive Models
S. No. Type of Model / Method Extensively Used This model is used for tasks that would benefit from the
Learning Algorithms insight gained from summarizing data in new and interesting
Classification and ways. As contrasting to predictive models that predict a
Regression target of interest; in a descriptive model, no single feature is
1 Supervised Decision Tree Tree (CART)
Learning Technique
Iterative Dichotomiser 3 more important than any other. In fact, because there is no
(ID3) goal to learn, the process of training a descriptive model is
C4.5 and C5.0 called unsupervised learning. The aim is to discern the data
Chi-squared Automatic and get few structures within. Unsupervised learning works
Interaction Detection fine on transactional data. The methodologies include self-
(CHAID)
organizing maps, nearest-neighbor mapping, k-means
Decision Stump
clustering and singular value decomposition are very popular.
M5
These algorithms are used to segment text topics, propose
Conditional Decision Trees
Bayesian
items and identify data outliers.
Naive Bayes (NB)
Methods Gaussian Naive Bayes C. Semi Supervised Learning
Multinomial Naive Bayes It is used in all the applications as similar as supervised
Averaged One-
dependence learning. However, it uses both labelled and unlabeled data
Estimators (AODE) for training. Methods such as classification, regression and
Bayesian Belief Network prediction are using this type of learning. Semi supervised
(BBN) learning is very useful when the cost associated with
Bayesian Network (BN) labelling is too high to allow for a fully labelled training
Artificial Neural Perceptron
Network
process. Recognizing a person's face on a web cam is one of
Back-Propagation the examples.
Hopfield Network
Radial Basis Function
Network (RBFN) D. Reinforcement Learning (RL)
Instance Based K - Nearest Neighbour In this type of learning, machine is trained to take exact
Lerning (KNN) decisions based on the business requirement with the
Learning Vector
Quantization (LVQ) objective to maximize the efficiency (performance). This
Self-Organizing Map repetitive learning process ensures less participation of
(SOM) human expertise and saves more time. In robotics, gaming
Locally Weighted Learning and navigation the reinforcement learning is often used. With
(LWL) the help of reinforcement learning, the algorithm discovers
Boosting
through trial and error which actions yield the best rewards.
Bootstrapped Aggregation
(Bagging)
AdaBoost
VI. MACHINE LEARNING TOOLS
Stacked Generalization
Gradient Boosting Machine learning offers a collection of tools that use
Machines (GBM) computers to convert data into actionable information. Tools
Gradient Boosted are a huge part of machine learning and selecting the right
Regression
Trees (GBRT) tool can be as important as working with the best algorithms.
Ensemble
Methods Random Forest Machine learning tools make applied machine learning faster,
k-Means easier. Excellent tools can automate each step in the applied
k-Medians machine learning process by reducing the time.
Expectation Maximization Three ways to think about machine learning tools are
Unsupervised Clustering (EM) described here.
2 Learning Methods Hierarchical Clustering x Platforms versus Libraries

x Graphical User Interfaces versus Command-Line
Ordinary Least Squares Interface versus Application Programming Interfaces
Regression (OLSR)
x Local versus Remote
Linear Regression
Logistic Regression A. Platforms versus Libraries:
Stepwise Regression 1) Platform Tools:
Multivariate Adaptive
Regression Splines x Provide facilities required at each step in a machine
(MARS) learning project.
Supervised / Locally Estimated x The interface may be graphical or command line.
Unsupervised Regression Scatterplot Smoothing x They afford a loose coupling of features.
3 Learning Algorithms (LOESS) 

978-1-5386-2745-7/17/$31.00 ©2017 IEEE 494


International Conference on Intelligent Computing and Control Systems
ICICCS 2017

x They are provided for general purpose use and  x Adapted for scale to be run on larger datasets.
exploration rather than speed, scalability or accuracy. x Run across multiple systems, multiple cores
 and shared memory.

2) Library Tools: x Fewer algorithms because of the
x Provides capabilities for completing part of a modifications essential for running at scale.

machine learning project. x Simpler interfaces providing less control over run
x Provide a specific capability for one or more steps configuration and algorithm parameterization.

 in a machine learning project. x Integrated into our local environment through
x The interface is normally an application remote procedure calls.
 programming interface requiring programming.
x They are customized for a specific use case, problem
D. Machine Learning Using R
type or environment.
B. Graphical User Interfaces (GUI) versus Command-Line R is the foremost choice among data professionals who
Interface (CLI) versus Application Programming Interfaces want to recognize and explore data, using statistical methods
(API): and graphs. It has several machine learning packages and their
advanced implementations for the top machine learning
1) Graphical User Interfaces - GUI:
algorithms. R is an open source language and functions in R
x Allocate less-technical users to work through are grouped into packages which are automatically loaded
 machine learning.
when R starts. The process of a machine learning project
x Focus on process and how to get the majority involves the following steps:
 from machine learning techniques.
x Structured process imposed on the user by x Defining the Problem
the interface. x Preparing the Data

x Major focus on graphical presentations of x Evaluating the Algorithms
information such as visualization. x Improving the Results
 x Presenting the Results
2) Command-Line Interface - CLI: Testing and evaluation of any Machine Learning
x Permits technical users who are not programmers algorithm require variety of datasets to be used. R comes with
 to work through machine learning projects. its own datasets. Various functions are available in R for
x Presents many less focused programs or program analyzing and plotting the data. Several Machine Learning
modes for specific sub-tasks of a machine learning algorithms can be implemented in R for the chosen datasets
 project. and their results can be compared for improvisations.
x Frames machine learning tasks in terms of the
 input required and output to be generated.
x Promotes reproducible results by recording or
scripting commands and command line arguments.

3) Application Programming Interfaces - API:
x Used to integrate machine learning into our
 own software projects.
 x Allows creating our own machine learning tools.
x Gives the flexibility to apply our own processes
and automations on machine learning projects. Fig. 4. Plotting Examples in R
x Allows combining our own methods with those  In R language, various types of plotting styles are
supplied by the library as well as extending provided available. The plotting styles like Scatter-Plot, Histogram and
methods. Box-Plot are shown in the above diagram as an example.
C. Local versus Remote: Various features of the data sets can be evaluated and
distinguished through these styles.
1) Local Tools: Can be downloaded, installed and run on
local environment. Construction of predictive models in R can be designed
x Tailored for in-memory data and algorithms. as follows.
x Control over configuration and parameterization. x Model creation using basic function.
x Integrated into our own systems to meet up all the x Evaluation of the model properties using print, plot
needs. and other methods.
2) Remote Tools: They can be hosted on a server and x Prediction of outcomes for samples using predict
called from local environment. These tools are referred to as method.
Machine Learning as a Service (MLaaS).

978-1-5386-2745-7/17/$31.00 ©2017 IEEE 495


International Conference on Intelligent Computing and Control Systems
ICICCS 2017

TABLE II. MACHINE LEARNING TOOLS ¾ Waffles apps are thin wrappers
Waffles
around functionality in a well-
Category Tools Features documented C++ class library.
¾ Script-friendly, and are
● A modern platform for applied designed to be used with
Platforms
WEKA - machine learning. automated processes.
Vs. Waikato ● Allows completing machine ¾ Waffles algorithms can
Environme
Libraries learning projects without automatically tune their own
nt for
Knowledge programming. parameters.
Analysis. ● Comprises tools for data pre- Deeplearni ™ The main commercial-grade,
processing, classification, ng4j open-source, distributed deep-
regression, clustering, learning library written for
association rules, and
Java and Scala.
visualization.
™ Integrated with Hadoop and
¾ One of the most efficient and Spark.
R
most widespread platforms for ™ DL4J is used in business
statistical programming and environments on
applied machine learning. distributed GPUs and CPUs.
¾ It is a variant of Lisp Local Vs. Shogun ● Implemented in C++ and
¾ Can parse and execute R Remote interfaces to Matlab(tm), R,
scripts that are typed in Octave, Java, C#, Ruby, Lua
directly or loaded from a file and Python.
with .R extension. ● Planned for unified large-scale
¾ Can create graphics to be learning for a wide range of
displayed on the screen or feature types and learning
saved to file, also prepare settings.
models that can be queried and
GoLearn
updated. 9 Google’s Go language has
wide collection of libraries.
● A Python module for machine 9 GoLearn is a 'batteries
Scikit-learn learning built on top of SciPy. included' machine learning
● Used for math and science library for Go.
work. 9 Created to address the lack of
● The libraries can be used for an all-in-one machine learning
interactive “workbench” / library for Go.
embedded into added software Google ● Provides pattern-matching and
and reused. Prediction machine learning capabilities.
Accord API ● After learning from training
o The Accord.NET Framework data, Prediction API can
delivers machine learning, predict a numeric value or
mathematics, statistics, choose a category that
computer vision, computer describes a new piece of data.
audition, and several scientific ● Examples: Guessing what
computing related methods movies or products a user
and techniques to .NET. might like, labeling emails as
o Contains libraries that provide spam or non-spam / guessing
a more conventional range of how much a user might spend
machine learning functions. on a given day.
™ Java Statistical Analysis Tool Amazon Machine Learning
AWS
JSAT with a Java library for offers visualization tools and
Machine Learning. wizards that direct through the
Orange ● An appropriate tool for quick process of creating ML models
GUI Vs.
comparisons across estimation without having to learn
techniques. complex ML algorithms and
CLI Vs. ● A slick desktop app where data technology.
can be processed through a
Microsoft
API number of steps and estimation Azure 9 A fully managed cloud service
techniques.
that enables to easily build,
● Uses a widget and connector
deploy and share predictive
construction on the GUI.
analytics solutions.

978-1-5386-2745-7/17/$31.00 ©2017 IEEE 496


International Conference on Intelligent Computing and Control Systems
ICICCS 2017

VII. AID OF MACHINE LEARNING IN VARIED FIELDS to a group of diverse challenging problems from clinical
Machine learning played excessive part in recent years as genomic analysis, through designing clinical decision support
significant development happened in various fields using it. systems, to investigate real world evidence for personalized
ML allows machines to make decisions from massive data. medicine.
Companies such as Google, Amazon, Accenture, Toyota, D. Promotion, Sales and Marketing
Hitachi, Tesla, Johnson & Johnson have embraced machine
Websites recommend few items to purchase based on the
learning at immense scale and improved their products and
customers’ previous purchases. It is using machine learning to
services. Small companies also developed innovative
analyze our buying history, and promote other items the
applications using machine learning. Amazon launched
person would be interested in. This facility to capture data,
machine learning platform in 2015 and exhibited more
analyze it and use it to personalize a shopping experience is
supportive reviews to customers, Google used the ML to
the future of retail. Microsoft’s new cloud based predictive
translate text in 27 languages. Tesla adopted ML in Auto pilot
analytics tool called Azure Machine Learning for sales
technology [20]. Machine learning is being used in a wide
forecasting is in progression and vibrant in this province.
range of application domains and few foretastes are listed
here. E. Shipping and Transportation
Analyzing data to recognize patterns and trends is a key
to the transportation industry, which relies on making routes
more efficient and predicting possible problems to increase
profitability. Traffic has been rising in major cities around the
world given the increase in masses of cars on roads and the
slow development of road infrastructure. Research scientist
and developer teams at Microsoft Research pioneered the use
of machine learning methods to build predictive models for
traffic. The work focused early on to prototypes that can infer
and predict the flow of traffic at different times into the future
based on the scrutiny of large amounts of data on traffic over
Fig. 5. Machine Learning in Various Disciplines months and years.
A. Financial Services and Economics
VIII. MACHINE LEARNING IN HEALTH CARE PREDICTIONS
In financial industries such as banks and other businesses
Machine learning algorithms are effective in spotting
use machine learning methodologies to identify significant
complex patterns in massive data. This competence is well-
insights in data, and to prevent fraud. A group of researchers
suited to medical applications, especially those that depend on
in the Machine Learning Technologies group at IBM Research
complex proteomic and genomic measurements. Hence
- Haifa are taking fraud prevention and detection to a new
machine learning is frequently used in various disease
level. The recommended solution analyzes historical
diagnosis and detection. In medical applications machine
transaction data to build a model that can detect fake patterns.
learning algorithms can produce better decisions about
This model is then used to process and examine a large
treatment plans for patients by means of providing effective
amount of financial transactions as they occur in real time,
healthcare system. Machine learning plays a key role in many
also known as stream computing.
radiology applications [32]. It also identifies complex patterns
B. Administration in Government Organizations automatically and helps radiologists make intelligent decisions
Government agencies such as public safety and utilities on radiology data such as conventional radiographs, CT, MRI,
have a specific need for machine learning since they have and PET images and radiology reports. Observational studies
multiple sources of data that can be mined for insights. have shown that various types of disease predictions were
Machine learning can also help to detect fraud and minimize carried out using different types of machine learning
identity theft. In U.K the Government Digital Service (GDS) prediction techniques. In recent times diagnosis and prediction
has been experimenting with special applications such as on various diseases like Cardiovascular diseases, cancers,
predicting page views to do anomaly detection and so far, is Diabetes, Hepatitis Asthma, Tuberculosis (TB) and Blood
focusing on signifying the competences of machine learning Pressure Monitoring have been carried out using various
algorithms on a number of products and prototype services. machine learning predictions techniques. Few glimpses on
prediction models applied in major diseases through the
C. Health Care Industry practice of machine learning are discussed here.
Machine learning is a fast-growing approach in the A. Predictions on Cardio Vascular Diseases
health care industry. This ML technology can also help
medical experts analyze data to identify risks that may lead to Cardiovascular disease (CVD) is a class of diseases that
improved diagnoses and treatment. IBM research group - The involve the heart or blood vessels. Cardiovascular disease
Machine Learning for Healthcare and Life Sciences is comprises coronary artery diseases (CAD) such as angina and
evolving and applying machine learning and data mining tools myocardial infarction (heart attack), stroke, hypertensive heart

978-1-5386-2745-7/17/$31.00 ©2017 IEEE 497


International Conference on Intelligent Computing and Control Systems
ICICCS 2017

disease, rheumatic, cardiomyopathy, atrial fibrillation, the physicians [30]. This work anticipated the use of decision
congenital heart disease, endocarditis, aortic aneurysms, trees C4.5 algorithm, ID3 algorithm and CART algorithm to
peripheral artery disease and venous thrombosis. Diagnosis categorize the diseases. Few studies proposed the ways to
and prediction of heart diseases is very much essential as it is obtain accurate prediction results and more informative
one of the leading causes of death all over the world [31]. knowledge about the Hepatitis C Virus (HCV) protein
Enormous works on prediction of heart diseases are carried cleavage sites using Decision tree algorithm [1].
out in health care research. The two most effective tools,
D. Cancer Predictions Using Machine Learning
neural networks and genetic algorithms presented a technique
for prediction of heart disease using major risk factors like Cancer is the universal name for a group of more than
age, family history, diabetes, hypertension, cholesterol, 100 diseases. While there are many kinds of cancers, all
smoking, alcohol intake, obesity or physical inactivity [3]. The cancers start as abnormal cells grow out of control. Over the
methods Naive Bayes, Neural network and Decision tree past decades, a continuous evolution related to cancer research
algorithms are used on analysis of medical data sets with has occurred concerning the prognosis and prediction of
respect to heart disease prediction [19]. A prototype Intelligent cancers like lung, skin, liver and stomach cancers [2, 24].
Heart Disease Prediction System (IHDPS) has been developed Several data mining and machine learning techniques are used
using Decision Trees, Naive Bayes and Neural Network. for cancers prediction and some of the works are considered at
Results exhibited that each technique has its unique strength in this point.
realizing the objectives of the defined mining goals [28]. In recent years of research, a few optimized models on
breast cancer diagnosis and recurrence prediction were
B. Diabetes Predictions developed by using various risk prediction algorithms and
Diabetes mellitus is a chronic disease and a major public techniques [17, 7]. There are reviews about various researches
health challenge worldwide. It develops when there are high made to predict cancer prognosis across cancer types and
blood sugar levels over a prolonged period. Recently, it has subtypes [16]. This review suggests to validate a method
been quoted as a risk factor for developing Alzheimer, and a across multiple sets of patients and conclusion is to understand
leading cause for blindness & kidney failure. Many techniques which data sets are most informative for a particular cancer
have been discovered to find the causes of diabetes and cure it. type/subtype. Various studies described the practical
Some of the predictions on diabetes is stated here. application of data mining and machine learning methods for
The research on diabetes prediction discussed about estimation of survival rate and disease relapse for breast
establishing a relationship between diabetes risk likely to be cancer patients [13]. A comparative study of prominent
developed from a person's daily lifestyle activities such as machine learning models was carried out where Naive Bayes
his/her eating habits, sleeping habits, physical activity along classifier is selected as a model for prognosis of cancer
with other indicators like BMI (Body Mass Index), waist survivability on the basis of the 5 years’ survival rate, while
circumference [4]. A Chi-Squared Test of Independence was the Artificial Neural Network has achieved the best
performed followed by application of the CART performance in prognosis of breast cancer recurrence [18]. A
(Classification and Regression Trees) machine learning research study implemented machine learning techniques like
algorithm on the data. The analysis about the performance of Decision Tree (C4.5), Support Vector Machine (SVM), and
the Naïve Bayes classifier, RBF network and SVM Classifier Artificial Neural Network (ANN) to develop the predictive
in the heart, cancer and diabetes datasets shows that SVM models for recurrence prediction in breast cancer. The main
classifier produces better percentage of accuracy in idea of this work was to relate the performance of these three
classification [21]. The work has been executed in WEKA well-known algorithms on the data through sensitivity,
environment and attained results show that SVM is the most specificity, and accuracy. This work concluded that SVM
robust and effective classifier for medical data sets. classification model predicted breast cancer recurrence with
least error rate and highest accuracy.
C. Hepatitis Disease Prediction
The study which applied three machine learning
Hepatitis means an injury to the liver with inflammation approaches support vector machine, C5.0 and extreme
of the liver cells. The illness can be self-limiting or can learning machine to find important risk factors to predict the
progress to fibrosis, cirrhosis or liver cancer. Hepatitis viruses recurrence-proneness for cervical cancer in small number of
are the most common source of hepatitis in the world but other patients illustrated that C5.0 model is the most useful
infections and toxic substances can also cause hepatitis. There approach to the discovery of recurrence proneness factors
are five core hepatitis viruses, stated to as types A, B, C, D [12]. To diagnose recurrence in cervical cancer, probabilistic
and E. Diagnosis of hepatitis disease is conducted with a model method has been used. In cervical cancer recurrence
machine learning methods proposed a novel machine learning prediction research, MARS and C5.0 classifier techniques
method that hybridizes support vector machine (SVM) and were used, and the results showed that decision tree model is
Simulated Annealing (SA) [22]. Simulated annealing is a better than MARS [6, 11].
stochastic method currently in wide use for optimization. In recent times diagnosis and prediction on further
There was a focus on the aspect of Medical diagnosis by diseases like Asthma, Tuberculosis (TB) and Blood Pressure
learning pattern through the collected data of hepatitis and to Monitoring have been carried out using various data mining
develop intelligent medical decision support systems to help and machine learning predictions techniques [25, 5, 23].

978-1-5386-2745-7/17/$31.00 ©2017 IEEE 498


International Conference on Intelligent Computing and Control Systems
ICICCS 2017

IX. CONCLUSION [10] Charles R. Farrar, Keith Worden, ―Structural Health Monitoring, A
Machine Learning Perspectiveǁ, John Wiley & Sons, Ltd., Publication,
In the past two decades Machine Learning has been ISBN: 978-1-119-99433-6.
transformed from an academic curiosity to a multi-billion- [11] Chi-Chang Chang, Sun-Long Cheng, Chi-Jie Lu and Kuo-Hsiung
dollar industry, and a point of focus for our economic, social, Liao, ―Prediction of Recurrence in Patients with Cervical Cancer Using
MARS and Classificationǁ, International Journal of Machine Learning
scientific, and security infrastructure. Abundant work in and Computing, Vol. 3, No. 1, February 2013.
machine learning has drawn interest on research in [12] Chih-Jen Tseng, Chi-Jie Lu, Chi-Chang Chang, Gin-Den Chen,
optimization, motivated by large-scale applications requiring - Application of machine learning to predict the recurrence-proneness for
analysis of massive high-dimensional data. cervical cancerǁ, Neural Computing and Applications, Springer, May
2014, Vol.24, Issue 6, pp 1311-1316.
This study shows that Machine Learning has given [13] Cirkovic, Bojana R.Andjelkovic; Cvetkovic, Aleksandar M, Ninkovic,
medical providers new tools to work with, novel ways to Srdjan M, Filipovic, Nenad D., "Prediction models for estimation of
practice medicine. It also confirms that machine learning tools survival rate and relapse for breast cancer patients", IEEE 15th
and techniques are decisive in health care province and International Conference on Bioinformatics and Bioengineering (BIBE),
vol., no., pp.1-6, 2-4 Nov. 2015.
exclusively used in the diagnosis and predictions of various [14] Daniel Combs, Safal Shetty, Sairam Parthasarathy, - Big-Data or Slim-
types of cancers. There are a lot of open problems and future Data: Predictive Analytics Will Rule with World, J Clin Sleep Med 16.
challenges in dealing with massive amounts of heterogeneous, [15] Daniel Pop, Gabriel Iuhasz, - Overview of Machine Learning Tools and
distributed, diverse, highly dynamic data sets and increasingly Libraries.
large amounts of unstructured and non-standardized [16] Das, Jishnu, Kaitlyn M. Gayvert, and Haiyuan Yu. "Predicting cancer
prognosis using functional genomics data sets." Cancer informatics 13.
information with respect to varied types of cancers. Some of Suppl. 5 (2014): 85.
the most important challenges in clinical practice and [17] Drukker, Caroline A, "Optimized outcome prediction in breast cancer by
biomedical research include the need to develop and apply combining the 70-gene signature with clinical risk prediction
novel tools for the effective integration, analysis and algorithms", Breast cancer research and treatment 145.3(2014): 697-705.
[18] Eshlaghy et al., ―Using Three Machine Learning Techniques for
interpretation of complex biomedical data with the aim to Predicting Breast Cancer Recurrenceǁ, J Health Med Inform 2013, 4:2.
identify testable hypothesis, and build accurate models to [19] Gandhi M, Singh, S.N, "Predictions in heart disease using techniques of
diagnose and predict various types of cancers and their data mining", 2015 International Conference on Futuristic Trends on
recurrences. The algorithms must also handle incomplete, Computational Analysis and Knowledge Management (ABLAZE), vol.,
noisy, even contradictory/ambiguous information. Therefore, no., pp.520-525, 25-27 Feb. 2015.
[20] https://www.research.ibm.com/
efficient machine learning approaches becomes essential in the [21] Janardhanan, Padmavathi L, Heena, Sabika, Fathima,
health care industry to address these challenges. Machine - Effectiveness of Support Vector Machines in Medical Data mining,
Learning and predictive analytics techniques could Journal of Communications Software & Systems, Mar2015, Vol. 11
revolutionize the entire healthcare industry by providing Issue 1, p25-30. 6p.
[22] Javad Salimi Sartakhti, Mohammad Hossein Zangooei, Kourosh
accurate insights and predictions related to symptoms, Mozafari, - Hepatitis disease diagnosis using a novel hybrid method
diagnoses, procedures and medications. based on support vector machine and simulated annealing (SVM-SA),
Computer methods and programs in biomedicine, Volume 108, Issue 2,
REFERENCES November 2012, Pages 570–579, Elsevier.
[23] Kaur et al., Using hybrid models to predict blood pressure reactivity to
[1] Ahmed Mohamed Samir, ―A Data Mining Approach for the Prediction unsupported back based on anthropometric characteristics, Front Inform.
of Hepatitis C Virus Protease Cleavage Sites, International Journal of Technology Electron. Eng. 2015.
Advanced Computer Science and Applications, Vol. 2, No. 12, [24] Konstantina Kourou, Themis P. Exarchos, Konstantinos P. Exarchos,
December 2011. Michalis V. Karamouzis, Dimitrios I. Fotiadis, - Machine learning
[2] Akutekwe, Arinze, Huseyin Seker, and Sunday Iliya. "An optimized applications in cancer prognosis and prediction, Elsevier, Computational
hybrid dynamic Bayesian network approach using differential evolution and Structural Biotechnology Journal 13 (2015) 8–17.
algorithm for the diagnosis of Hepatocellular Carcinoma." Adaptive [25] Luo et al., ―Predicting asthma control deterioration in children, BMC
Science & Technology (ICAST), 2014 IEEE 6th International Medical Informatics and Decision Making (2015) 15:84.
Conference, IEEE, 2014. [26] M. I. Jordan, T. M. Mitchell, ―Machine learning: Trends, perspectives,
[3] Amin, S.U.; Agarwal, K.; Beg, R., "Genetic neural network based data and prospects, Science 349, 255 (2015).
mining in prediction of heart disease using risk factors", IEEE [27] Machine Learning, Tom Mitchell, McGraw Hill, 1997.
Conference on Information & Communication Technologies (ICT), [28] Palaniappan S, Awang R, "Intelligent heart disease prediction System
2013, vol., no.,pp.1227-31,11-12 April 2013. using data mining techniques," IEEE/ACS International Conference on
[4] Anand, Ayush, Shakti, Divya, "Prediction of diabetes based on personal Computer Systems and Applications, AICCSA 2008., vol., no., pp.108-
lifestyle indicators," 1st International Conference on Next Generation 115, March 31 2008-April 4 2008.
Computing Technologies (NGCT), vol., no., pp.673-676, 4-5 Sept. 2015. [29] S. Muthukaruppan, ―A hybrid particle swarm optimization based fuzzy
[5] B Nithya, “Study on Predictive Analytics Practices in Health Care expert system for the diagnosis of coronary artery diseaseǁ, Expert
System”, IJETTCS, Vol. 5, Feb. 2016. Systems with Applications, (2012) 11657–1166.
[6] Bahl, Ravinder, S. K. Spolia, and Chandra Mauli Sharma. "Predicting [30] Sathyadevi G, "Application of CART algorithm in hepatitis disease
recurrence in cervical cancer patients using clinical feature analysis." diagnosis", International Conference on Recent Trends in Information
British Journal of Medicine and Medical Research 6.9 (2015): 908-917. Technology (ICRTIT), vol., no., pp.1283-1287, 3-5 June 2011.
[7] Bhardwaj, Arpit, and Aruna Tiwari. ―Breast cancer diagnosis using [31] Shanthi Mendis, Pekka Puska, Bo Norrving, World Health Organization
genetically optimized neural network model, Expert Systems with (2011), Global Atlas on Cardiovascular Disease Prevention and Control,
Applications 42.10 (2015): 4611-4620. PP. 3–18. ISBN 978-92-4-156437-3.
[8] Brett Lantz, ―Machine Learning with R, Packt Publishing Ltd, 2013, [32] Shijun Wang, Ronald M. Summers, ―Machine Learning and Radiology,
ISBN 978-1-78216-214-8. Med Image Anal. 2012 July; 16(5): 933–951.
[9] Cancer Informatics 2014:13(S2) 19–28 doi: 10.4137/CIN.S13788.

978-1-5386-2745-7/17/$31.00 ©2017 IEEE 499

You might also like