A Healthcare Monitoring System Using Random Forest & IOT (MTAP)
A Healthcare Monitoring System Using Random Forest & IOT (MTAP)
A Healthcare Monitoring System Using Random Forest & IOT (MTAP)
https://doi.org/10.1007/s11042-019-7327-8
Abstract
The Internet of Things (IoT) enabled various types of applications in the field of
information technology, smart and connected health care is notably a crucial one is
one of them. Our physical and mental health information can be used to bring about a
positive transformation change in the health care landscape using networked sensors.
It makes it possible for monitoring to come to the people who don’t have ready
access to effective health monitoring system. The captured data can then be analyzed
using various machine learning algorithms and then shared through wireless connec-
tivity with medical professionals who can make appropriate recommendations. These
scenarios already exist, but we intend to enhance it by analyzing the past data for
predicting future problems using prescriptive analytics. It will allow us to move from
reactive to visionary approach by rapidly spotting trends and making recommendations
on behalf of the actual medical service provider. In this paper, the authors have
applied different machine learning techniques and considered public datasets of health
care stored in the cloud to build a system, which allows for real time and remote
health monitoring built on IoT infrastructure and associated with cloud computing.
The system will be allowed to drive recommendations based on the historic and
empirical data lying on the cloud. The authors have proposed a framework to uncover
knowledge in a database, bringing light to disguise patterns which can help in
credible decision making. This paper has evaluated prediction systems for diseases
such as heart diseases, breast cancer, diabetes, spect_heart, thyroid, dermatology, liver
disorders and surgical data using a number of input attributes related to that particular
disease. Experimental results are conducted using a few machine learning algorithms
considered in this paper like K-NN, Support Vector Machine, Decision Trees, Random
Forest, and MLP.
* Munish Kumar
munishcse@gmail.com
1 Introduction
through IoT agent/mobile devices on the cloud network. In the cloud network, data is
processed and stored in an efficient manner. Doctors take action remotely by processing
the data and uploading their responses onto the cloud. A primary project dealing with
healthcare initiative is the quality of service for diagnosing disease capability and
performing a service of effective treatments to patients. Diagnosis is complicated but
essential undertaking needs to be executed efficaciously. The prediction is made fre-
quently based on the doctor’s experience and understanding which may at times be
erroneous leading to undesirable effects. Therefore, there arises a need of an automated
medical analysis system to be designed that will take gain of collected information base
and decision support system. This system can assist in diagnosing disorder much
frequently with lesser medical tests and even before major symptoms are felt by the
patient. For the realization of this need, authors make use of the healthcare information
systems managed by hospitals. They collected huge amounts of data in order to extract
hidden information and build an intelligent disease prediction system that diagnose
disease using a historical database of various diseases. Therefore, machine learning
techniques are adopted for processing large amount of patient data and considerably
extract inherent, formerly concealed and substantially useful information about data in an
effective way in the e-healthcare network. Tao et al. [10] have exhibited a new classifier
named multicolumn BLSTM (MBLSTM), which viably joins diverse different acceler-
ation signal features to additionally enhance action recognition precision. They likewise
inferred that cell phones based human action dataset propose that MBLSTM is better
than other cutting edge strategies. Tao et al. [11] have planned a structure for powerfully
taking in the semantic content of mobile images, in light of which the important labels
are alloted to the query images. Their proposed structure is extremely obtuse to the noise
and the anomalies, and is improved by a half-quadratic enhancement system. They have
likewise presumed that their image labeling approach is more robust than the other sparse
coding methods. The aim of this paper was to evaluate the performance of existing
machine learning techniques for various public dataset for health care applications and to
present an efficient methodology for predicting the disease based on IoT network. The
paper has been outlined as follows. Section 2 defines the related work done in the health
care for diseases of prediction. Section 3 presents, proposed prediction model for
detecting a number of diseases using machine learning techniques and IoT. Significance
of the proposed work is presented in Section 4. Section 5 depicts the experimental results
and performance analysis. Finally, concluding notes of present work are given in
Section 6.
2 Related work
et al. [5] have presented e-healthcare service framework based on cloud in Iraq, in which
all which all the patient data is chronicled in a focal database.. The proposed framework
depends on Service Oriented Architecture (SOA) and offers different offices like en-
hancing cost the executive’s time, stores patients profile and takes the correct specialist
choice. Parekh and Saleena [9] have presented a cloud based framework for the health
care system by using clustering as data mining technique. This methodology empowered
patients to get required wellbeing administrations through a versatile application with a
few ticks. Vijayarani and Dhayanand [16] used Naïve Bayes and Support Vector Ma-
chine (SVM) algorithm to predict kidney disease. The performance analysis is done
based on the accuracy and execution time. The experimental results have shown that the
SVM technique provides better accuracy compared to Naïve Bayes. Hsu et al. [6]
proposed a model based on health information for breast cancer risk assessment. In this
method, sampling and dimension reduction techniques were applied to pre-process the
testing data. Thereafter, various classifiers were used for risk prediction. Devi and Shyla
[2] applied data mining techniques for early prediction of diabetes disease. They used
768 instances collected from PIMA Indian dataset to determine accuracy. Their analysis
proves that J48 classifier has better accuracy as compared to other techniques.
Turanoglu-Bekar et al. [13] used various decision tree data mining techniques (such as
NB tree, J48, LADTree, BFTree, LMT, Random Forest etc.) for prediction of the thyroid
disease. The performance analysis is done based on six performance metrics such as
accuracy, MAE, PRE, REC, FME, and Kappa statistics. Their results reflected that NB-
Tree has higher accuracy as compared to other algorithms.
Verma et al. [15] have proposed Coronary Artery Disease (CAD) method using
particle swarm intelligence and K-means algorithm for risk factor identification. They
deployed various learning algorithms such as Multi-Layer Perceptron (MLP), Multino-
mial Logistic Regression (MLR), fuzzy unordered rule induction algorithm and C4.5 for
extracting data events. The dataset is collected from the Department of Cardiology, Indira
Gandhi Medical College, Shimla, India. This data set consists of 26 features and 335
instances. The experimental results show that MLR achieved a highest accuracy of
88.4%. Forkan et al. [4] have proposed a predictive model ViSiBiD for daily monitoring
and disease prevention based on analyzing the vital signs of patients. The machine
learning techniques along with some Map-Reduce implementation have been used for
learning the cloud platform. They used 4893 patient’s dataset which is publicly available
and observed that six bio-signals deviated from the normal and different features for
observing the data events. Data events are collected in 1–2 h’ intervals. Their results
show that random forest has the best accuracy of 95.85% as compared to other tech-
niques. Jahangir et al. [7] designed an application of automatic Multi-Layer Perceptron
(AutoMLP) for diabetes prediction. In this technique, enhanced class outlier detection is
also used. It performs parameter tuning automatically during training process. The outlier
detection is performed during pre-processing of data. Osman and Aljahdali [8] proposed
a technique to diagnosis diabetes disease based on improved feature extraction. For
classification, they considered SVM and K-means techniques. The experimental analysis
depicts that the proposed technique provides better classification rate of diabetes diag-
nosis than existing techniques. Zhang et al. [17] designed a novel statistical learning
theory based cancer prediction technique. The technique is capable to apply on binary
classification and multi-class problems. The SVM technique creates the large hyper-
planes in high dimensional space, which maximizes the separation between data points,
Multimedia Tools and Applications (2019) 78:19905–19916 19909
and support vectors are used to create hyper-plane. The SVM provides better accuracy
but expensive in terms of computational time. From the literature review, it is observed
that the use of meta-heuristic techniques is ignored by most of the existing researchers to
improve the classification rate of existing machine learning techniques. So, in this paper,
efficient machine learning algorithm Random Forest has been used to improve the
classification results.
3 Proposed system
The Internet of Things (IoT) is a true technology enabled a network of networks. IoT’s is
able to connect remote and mobile things or machines through the use of wireless
communications for sensors which are low in cost for assisted by computing and storage
devices. As IoT has brought everything associated through web, this associated system
offers new chances to improve activities across manufacturing, agriculture, economy and
medicinal services. IoT based health care monitoring system comprises of “internet on
health sensor things”. It contains big data that is difficult to maintain by the physician.
However, physicians require to utilize this historical data to predict the health of the
patients. Many machine learning techniques have been used in the past few decades for
health care applications. However, standard machine learning techniques suffer from
parameter tuning issue. Therefore, efficient tuning of these parameters has an ability to
improve the performance of existing machine learning techniques to predict various
medical applications such as cancer, diabetes, brain tumor, etc. The way towards
investigating through the information to find hidden associations and predict future
patterns has a long history. Once in a while referred to as, “learning revelation in
databases”. Be that as it may, its establishment involves three inter-mixed disciplines:
statistics, artificial intelligence and machine learning. Using these three disciplines, the
authors plan to produce a framework which allows for electronic remote health moni-
toring built on IoT basis and helped with cloud computing. Such situations as of now
exist, one can expect to include the advantages of machine learning to the cloud, holding
the medicinal services database with a specific end goal to give constant observing of
patient status and oversee restorative treatment to each client of the framework. Authors
proposed a framework of decision support system in the cloud and ground of machine
learning techniques that can be applied on cloud data and check their performances.
Patients range from those equipped with all easily available and affordable sensors at
their respective homes to the ones situated at remote locations or at a distance from the
medical provisioning. The third kind of service seekers includes all the patients visiting
the clinics or laboratories which do not have the physical presence of the doctors, but are
equipped with all the necessary medical devices and assisting staff who act as the
intermediate. They deploy the healthcare information collected from patients using the
sensor systems onto the cloud for the doctors to access and respond. All the collected
health care information is transferred initially into the mobile devices through a sensor
network. The sensor network may be bluetooth, Wi-Fi application or USB based
connection. Cell phones go about as the IoT operator and are utilized to send the health
information of a patient onto the cloud. The cloud would deal with the expanding
volume of health information, creatively share the data crosswise over social insurance
frameworks and give reasonability of information mining, which is an important and
19910 Multimedia Tools and Applications (2019) 78:19905–19916
crucial exercise in performing work. The implementation so far in this research involves
application of major machine learning techniques, namely, K-NN, Linear-SVM, Decision
Tress, Random Forest and MLP on the dataset of the diseases, namely, breast cancer,
diabetes, heart diseases, spect_heart, thyroid, Surgery data, dermatology and liver disor-
der. Random forest is unexcelled in accuracy among other existing supervised learning
algorithms for classification and runs efficiently on large databases. Random forest
classifier creates a set of decision trees from a randomly selected subset of the training
set. It then aggregates the votes from different decision trees to decide the final class of
the test object. In present work, Random forest classifier is used to extract the exact
information from the database related to the query dataset of the diseases. The block
diagram of the proposed system is illustrated in Fig. 1. The results of the techniques are
compared based on their accuracy and Area Under Curve (AUC) as shown in Figs. 2-3.
A few data mining techniques, namely, k-Nearest Neighbor, Linear-SVM, Decision Tree,
Random Forest and MLP are used and the performance of these techniques is compared
to ensure quality of service to the healthcare industry. K-Nearest Neighbor technique is
based on nearest neighbor data points finds the unidentified data points and classifies the
data points according to the voting system. The K-NN technique is easy to implement,
but requires large storage space, sensitive to noise and slow testing procedure [12].
Linear Support Vector Machine technique is based on statistical learning theory. The
technique is capable to be applied to binary classification and multi-class problems. The
SVM technique creates the large hyper-planes in high dimensional space, which maxi-
mizes the separation between data points, and support vectors are used to create a hyper-
plane. The SVM provides better accuracy, but expensive in terms of computational time
Fig. 1 Block diagram of prediction model for IoT based healthcare network
Multimedia Tools and Applications (2019) 78:19905–19916 19911
[12]. Decision Tree technique is based on tree-like graph, in which 3 nodes (such as non-
leaf, leaf and branch nodes) are used as different attributes for calculating the conditional
probabilities. The node at the top is used as the root node, leaf node work as a class
label, branch nodes are outcomes of the test, and non-leaf nodes are used to denote the
test. The decision tree technique does not require domain knowledge. Also, it is easy to
interpret and can handle categorical and numerical data. On the other hand, the perfor-
mance is dependent on the dataset and restricted to one attribute output [12]. Random
Forest is developed by Leo Breiman. The technique is adopted in a number of applica-
tions such as classification, prediction, variable study and selection [14]. The technique
has several characteristics such as it can be applied to two class or multi-class problem,
mixture of continuous and categorical predictors. On the other side, fine-tuning of input
parameters is required for better performance [3]. MLP is the feed forward artificial
neural network technique. In this technique, neural network layer tries to calculate errors
in the output and revert the output to hidden layer to update the internal weight [1]. The
MLP technique has a high prediction accuracy as compared to other techniques. The IoT
has many health care applications like remote health monitoring, fitness programs,
medication at home by healthcare providers, etc. Health care services based on IoT are
expected to reduce health diagnostic costs and increase the quality of health care
management. Due to distant physical locations, sometimes manual health care services
are unable to reach on time and thus causes severe danger to life. So, the main
motivation behind using the health care system based on IoT is to use updated health
care networks driven by wireless technologies in order to support chronic diseases, early
diagnosis, real-time monitoring and medical emergencies. Moreover, the IoT provides
efficient scheduling of limited resources by ensuring their best use and services to
patients. According to the report given by the United Nations Population Fund, the
people aged 60 and older make up 12.3% of the global population and by 2050, that
number will rise to almost 22%. Thus, in such scenario, there is going to be a huge need
of medical assistance to the elderly population, in which IoT based health care system
will play a vital role in bridging the gap between the doctor and the patients.
In this section, results of the healthcare system using a few machine learning algorithms
like K-NN, Support Vector Machine, Decision Trees, Random Forest, and MLP are
presented. Authors have used a number of public datasets for different diseases, namely,
breast cancer, diabetes, heart disease, Spect-heart, thyroid, surgery, dermatology and liver
disorder as shown in Table 1. These datasets are downloaded from https://archive.ics.uci.
edu/ml/datasets.html. For experimental work, the data set of each disease is divided into
training and testing dataset i.e. 80% data as the training set and remaining 20% data as
the testing set. To assess the performance of various machine learning algorithms,
authors have used the Waikato Environment for Knowledge Analysis (WEKA) open
source tool. Disease wise comparative analysis of accuracy and Area Under Curve
(AUC) for different machine learning techniques are presented in Tables 2 and 3,
respectively. These results are graphically also shown in Figs. 2 and 3. In breast cancer
related dataset which contains a total of 699 samples corresponding to 2 classes, the
highest accuracy of 96.42% has been achieved by two machine learning techniques,
namely, Linear SVM and Random forest. Random forest provides highest diagnostic rate
of 99.6% using AUC parameter. The dataset related to diabetes contains 768 samples and
2 classes. On diabetes, Random forest provides highest accuracy and diagnostic rate of
81.16% and 85.2%, respectively. Heart disease dataset comprises a total of 303 samples
corresponding to 5 classes. The highest accuracy of 57.37% has been achieved on this
dataset using Linear-SVM, while Random forest provides the highest diagnostic rate of
82.3%. Spect_Heart dataset contains 187 samples and 2 different classes. In this dataset,
Random forest reports the highest accuracy and diagnostic rate of 72.19% and 75.8%,
respectively. Data set on thyroid contains a total of 9172 samples and 6 different classes.
Random forest provides the highest accuracy and diagnostic rate of 70.12% and 81.4%,
respectively, on this dataset. Surgery_Data comprises of 470 samples corresponding to 2
different classes. On this data set, the highest accuracy and diagnostic rate of 81.91% and
57.9% have been achieved by Linear-SVM and MLP, respectively.
Dermatology dataset contains 366 samples and 6 classes. The highest accuracy of 97.26%
of this dataset has been reported by Random forest. The highest diagnostic rate of 99.9% has
been observed by Random forest and MLP. Liver_Disorder dataset comprises a total of 345
samples corresponding to 2 different classes. In this data set, Linear-SVM provides highest
accuracy of 69.56% and MLP provides highest diagnostic rate of 80.1%.
6 Conclusion
The work presented in this paper proposes a health care system based on random forest
classifier and IoT. The proposed system will improve interactivity between patients and
doctors. Experimental results are conducted using various datasets related to different
diseases, namely, breast cancer, diabetes, heart disease, spect-heart, thyroid, surgery,
dermatology and liver disorder for testing the effectiveness of the proposed work. The
machine learning techniques that have been employed in this work are K-NN, Support
Vector Machine, Decision Trees, Random Forest, and MLP. Maximum accuracy of
97.26% has been achieved on Dermatology dataset using Random Forest machine learning
technique. It has been analyzed that on an average, random forest provides good accurate
results for each of the datasets considered. This accuracy can probably be increased by
considering a larger data set while training the proposed system with different machine
19914 Multimedia Tools and Applications (2019) 78:19905–19916
learning algorithms. Therefore, in the near future, this work can be extended to other
applications such as earth observations, weather forecasting, etc. Security of data is the
main implication in IoT based framework because IoT devices collect useful data with the
help of various existing technologies and transfer that data to other IoT devices or systems.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
References
1. Costa K, Ribeiro P, Carmargo A, Rossi V, Martins H, Neves M, Fabris R, Imaisumi R, Papa JP (2013)
Comparison of the techniques decision tee and MLP for data mining in SPAMs detection in computer
networks. Proceedings of the 3rd international conference on innovative computing technology, pp
344–348
2. Devi MR, Shyla JM (2016) Analysis of various data mining techniques to predict diabetes mellitus. Int J
Appl Eng Res 11(1):727–730
3. Diaz-Uriarte R, Alverez-de-Andres S (2006) Gene selection and classification of microarray data using
random forest. BMC Bioinformatics. https://doi.org/10.1186/1471-2105-7-3
4. Forkan ARM, Khalil I, Atiquzzaman M (2017) ViSiBiD: a learning model for early discovery and real time
prediction of severe clinical events using vital signs as big data. Comput Netw 113:244–257
5. Hameed RT, Mohamad OA, Hamid OT, Tapus N (2015) Design of e-healthcare management system
Basedon cloud and service oriented architecture. Proceedings of the 5th IEEE international conference on E-
health and bioengineering (EHB), pp 1–4
6. Hsu JL, Hung PC, Lin HY, Hsieh CH (2015) Applying under-sampling techniques and cost-sensitive learning
methods on risk assessment of breast cancer. J Med Syst 39(4). https://doi.org/10.1007/s10916-015-0210-x
7. Jahangir M, Afzal H, Ahmed M, Khurshid K, Nawaz R (2017) An expert system for diabetes prediction
using auto tuned multi-layer perceptron. Proceedings of the intelligent system conference, pp 722–728
8. Osman AH, Aljahdali HM (2017) Diabetes disease diagnosis method based on feature extraction using k-
svm. Int J Adv Comput Sci Appl 8(1):236–244
9. Parekh M, Saleena B (2015) Designing a cloud based framework for healthcare system and applying
clustering techniques for region wise diagnosis. 2nd international symposium on big data and cloud
computing (ISBCC’15), 50:537–542
10. Tao D, Wen Y, Hong R (2016) Multicolumn bidirectional long short-term memory for mobile devices-based
human activity recognition. IEEE Internet Things J 3(6):1124–1134
11. Tao D, Cheng J, Gao X, Li X, Deng C (2017) Robust sparse coding for Mobile image labeling on the cloud.
IEEE Trans Circuits Syst Video Technol 27(1):62–72
12. Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. International Journal of
Bio-Science and Bio-Technology 5(5):241–266
13. Turanoglu-Bekar E, Ulutagay G, Kantarc-Savas S (2016) Classification of thyroid disease by using data
mining models: a comparison of decision tree algorithm. Oxford Journal of Intelligent Decision and Data
Science 2:13–28
14. Verikas A, Gelzinis A, Bacauskiene M (2011) Mining data with random forest: a survey and results of new
tests. Pattern Recogn 44(2):330–349
15. Verma L, Srivastava S, Negi PC (2016) A hybrid data mining model to predict coronary artery disease cases
using non-invasive clinical data. J Med Syst 40(7):178
16. Vijayarani S, Dhayanand S (2015) Data mining classification algorithms for kidney diseases prediction.
International Journal on Cybernetics & Informatics 4(4):13–25
17. Zhang L, Zhou W, Wang B, Zhang Z, Li F (2018) Applying 1-norm svm with squared loss to gene selection
for cancer classification. Appl Intell 48(7):1878–1890
Multimedia Tools and Applications (2019) 78:19905–19916 19915
Pavleen Kaur graduated in Computer Science & Engineering from Thapar Institute of Engineering and
Technology, Patiala, India in 2014. Presently, she is pursuing her Masters in Software Engineering from Thapar
Institute of Engineering and Technology, Patiala, India. Her industrial experience includes functioning as a
Business Analyst intern and her current examination works incorporate use of Deep Learning in Machine
Translation and Big Data Mining.
Ravinder Kumar received the Ph.D. degree in computer science and engineering from Thapar University in
2015. He is currently an Assistant Professor with the Computer Science and Engineering Department, Thapar
University. He has already developed a complete working project on speech recognition and handwritten
recognition for Indian regional language (Punjabi). His area of research includes theoretical and practical aspects
of combinatorial optimization, approximation algorithm, and mathematical programming. He is the member of
various professional bodies and serves as a Reviewer to many referred journals.
19916 Multimedia Tools and Applications (2019) 78:19905–19916
Munish Kumar received his Master’s degree in Computer Science & Engineering from Thapar University,
Patiala, India in 2008. He received his Ph.D. degree in Computer Applications from Thapar University, Patiala,
India in 2015. He started his career as an Assistant Professor in computer application at Jaito Centre of Punjabi
university, Patiala. Presently, he is working as Assistant Professor in Department of Computational Sciences,
Maharaja Ranjit Singh Punjab Technical University, Bathinda, Punjab, India. His research interests include
Character Recognition, Computer Vision and Pattern Recognition.
Affiliations
Pavleen Kaur
amicablepavleen@gmail.com
Ravinder Kumar
ravinder@thapar.edu
1
Computer Science and Engineering Department, Thapar Institute of Engineering & Technology, Patiala,
Punjab, India
2
Department of Computational Sciences, Maharaja Ranjit Singh Punjab Technical University, Bathinda,
Punjab, India