Personal and Ubiquitous Computing
https://doi.org/10.1007/s00779-017-1106-1
ORIGINAL ARTICLE
Mining intelligent solution to compensate missing data context
of medical IoT devices
Paul S. Fisher 1 & Jimmy James II 1 & Jinsuk Baek 1
&
Cheonshik Kim 2
Received: 19 November 2017 / Accepted: 7 December 2017
# Springer-Verlag London Ltd., part of Springer Nature 2017
Abstract
When gathering experimental data generated by medical IoT devices, the perennial problem is missing data due to recording
instruments, errors introduced which cause data to be discarded, or the data is missed and lost. When faced with this problem, the
researcher has several options: (1) insert what appears to be the best replacement for the missing element, (2) discard the entire
instance, (3) use one of the algorithms that will consider the data and then suggest viable candidate values for replacement. We
discuss the options and introduce another mining intelligent technique based upon Markov models.
Keywords Data mining . Missing data . Markov model . IoT
1 Introduction
Jimmy James, II
jjames110@rams.wssu.ed
What does the phrase Bmissing data^ really mean? Missing
data has several sources, two of the most common are due to
the subject not cooperating or missing, or the missing data is
due to some random event. In this latter case, the subject of our
experiment was available, but in the collection process something went awry [3, 4]. When the missing data is due to noncooperation of the subject, or the subject is missing at the time
the data was to be collected, then there is nothing to be done.
In the case of the random deletions or errors in data, we address this issue in the following sections of this paper. We
examine two widely used systems and describe our approach,
and then provide a comparison between the three approaches.
These are not the only two approaches to determining missing
data values. A short synopsis is available in [4]. One can use
statistical estimation based upon Bayesian formula, decision
trees, or clustering algorithms. Alternatively, one can simply
examine the data and fill in the missing data by a best guess
technique, which is often too laborious for large data sets. The
motivation and contribution for this research come from the
idea that
Jinsuk Baek
baekj@wssu.edu
&
The Internet of things (IoT) technology [1] allows physical,
electronic devices to be sensed or controlled remotely across
the available network infrastructure. Those devices are embedded with software, sensors, and actuators, which are able
to inter-operate within the existing network domain. Since its
autonomous nature results in economic benefit, the IoT is
expected to be widely deployed in many applications requiring integration of physical digital world into computer-based
systems. One of the most important issues in the IoT network
is how to effectively manage the missing data sent by heterogeneous multiple IoT devices [2].
* Paul S. Fisher
fisherp@wssu.edu
Cheonshik Kim
mipsan@sejong.ac.kr
1
Department of Computer Science, Winston-Salem State University,
Winston-Salem, NC, USA
2
Department of Computer Science and Engineering, Sejong
University, Seoul, South Korea
A researcher may wish to see how the missing data element was determined. That is, what chain of events caused
this particular system to select the value that it did. Given
that the chain is available, the researcher can determine if
the conclusion is valid. Once the user examines the derivation of rules for that element in each instance, the system can then just fill in the appropriate values directly
using the historical set of rules.
Pers Ubiquit Comput
Table 1 First 15 instances of the mammographic mass data set with age
quantized
Consider data representing the US family incomes, if the
average income of a US family is X then that value can be used
to replace missing income values. We think there is a better way.
Instead of placing a statistical value in place of the missing data,
we think that likely values derived from the existent data gives
the researcher the opportunity to examine multiple scenarios
and then allow selection of a value that seems most reasonable.
In any case, the final decision must rest with the researcher until
a level of confidence is established. Then as mentioned, the
system can then for this type of data simply fill in the missing
values. This presupposes that the data provides some relationship between the missing data and the other elements that are
present. If there is no correlation, then a guess or some statistical
value is the only recourse.
Missing values can be determined using regression,
inference-based tools using Bayesian formalism, decision
trees, clustering algorithms (K-Mean\Median etc.). For example, we could use clustering algorithms to create clusters of
rows which will then be used for calculating an attribute mean
or median. We propose another process using a Markov model
to try and predict the probable values in the missing attribute,
according to other attributes in the data.
Instance
0
1
2
3
4
5
5
6
6
7
7
1
1
3
3
3
2
2
2
2
2
2
4
5
2
3
5
4
4
5
5
6
6
5,7,1,4,3,1
4,4,1,1,9,1
4,7,9,9,3,0
3,4,2,1,3,1
5,5,4,5,3,1
4,2,1,1,3,0
5,4,1,9,3,0
5,5,1,5,3,1
4,6,1,9,3,0
4,3,3,1,2,0
5,7,1,5,9,1
5,6,9,5,1,1
4,6,2,1,2,0
PrðXn þ 1 ¼ xj Xn ¼ xn Þ
ð1Þ
The equation in (1) implies a probability distribution overlaying a state space where the exit from a state, that is a state
transition, only depends on the previous state having a value
of xn. One can easily generalize this to a collection of just
previous states by defining Xn−1, Xn−2...Xn-k etc. This would
be a model of order k. Now each next state is determined by
the last k symbols. The approach is based upon data mining
techniques where the objective is to determine a set of rules
that define cause and effect. The Markov model provides the
framework for the cause and effect, but in its first order form
(that is where the next value depends only upon the previous
value) it is not sufficient for the richness of the typical data that
results from some observed or measured phenomenon.
A Markov chain is a mathematical model for systems whose
states, discrete, or continuous are governed by a transition
probability. Another way to handle a data set with an arbitrary
missing data pattern is to use the Markov Chain Monte Carlo
(MCMC) approach [5] to impute enough values to make the
missing data pattern monotone. For example, items are ordered such that if item b is missing, then items b + 1 >...n are
also missing. Then, one can use a more flexible imputation
method [6]. A Markov chain comes from a sequence of random variables where the only dependence is serial. Using this
approach allows the examination of Bwhat happens next.^ In
Group ID
4,6,1,9,3,0
order to consider sequences of symbols in the Markov model,
we need to assert that the symbol in the next position only
depends upon the present value of one or more symbols in
the immediate past. That is, no next symbol can be thought
of as depending upon some distant symbol(s) that has happened in the past. The representation of the Markov chain
can be thought of as the probability of moving from one state
to the next state. We can represent this by the probabilities
expressed as follows:
2 Markov models
Table 2 Examples from dataset
with Markov rules of order 2, the
BS^ is start symbol
5,6,3,5,3,1
Markov rules: order 2
2
4
2
1
2
1
9
1
9
9
2
4
4
1
1
1
1
9
1
4
1
1
3
3
3
3
9
3
4
3
3
2
9
0
0
0
0
0
0
0
0
0
0
0
S1 → 4
S1 → 5
S3 → 2
S3 → 3
S3 → 5
S2 → 4
S2 → 4
S2 → 5
S2 → 5
S2 → 6
S2 → 6
14 → 2
15 → 4
32 → 2
33 → 1
35 → 2
24 → 1
24 → 9
25 → 1
25 → 9
26 → 9
26 → 1
42 → 4
54 → 4
22 → 1
31 → 1
52 → 1
41 → 1
49 → 9
51 → 1
59 → 4
69 → 1
61 → 1
24 → 3
44 → 3
21 → 3
11 → 3
21 → 9
11 → 3
99 → 4
11 → 3
94 → 3
91 → 2
11 → 9
43 → 0
43 → 0
13 → 0
13 → 0
19 → 0
13 → 0
94 → 0
13 → 0
43 → 0
12 → 0
19 → 0
Pers Ubiquit Comput
Further, the single effect is likewise not sufficient for our approach. A better view of this model would be a random walk
through a graph, where each vertex can have one or more
outgoing edges. The selection of the path would be dependent
upon the previous n states, which may be different for each
selection. This makes our model non-stationary.
3 The test data
Whether the data set we are trying to repair obeys these assumptions defined for the Markov model needs to be examined. For our test data, we have selected the data in the
Mammographic Mass Data set [4, 7] for an example to illustrate this process. We observe how the proposed method works
well with the medical data. We know that in the medical process, a physician examines symptoms, physical characteristics
of the patient and test results. Looking at the totality of these
measures gives the physician the ability to determine if the
patient is a potential cancer victim. The original dataset consists of 961 instances, and each instance contains six parameters. Of these instances, 516 are benign and 445 are malignant.
1. BI-RADS assessment: 1 to 5 (ordinal)
2. Age: patient’s age in years (integer)
3. Shape: mass shape: round = 1, oval = 2, lobular = 3, irregular = 4 (nominal)
4. Margin: mass margin: circumscribed = 1,
microlobulated = 2, obscured = 3, ill-defined = 4,
spiculated = 5 (nominal)
5. Density: mass density high = 1, iso = 2, low = 3, fat-containing = 4 (ordinal)
6. Severity: benign = 0 or malignant = 1 (binominal)
A few instances are shown in Table 1. In this table, the B9^
represents the missing data element. In order to process the
data, we grouped the age parameter to clusters with a span of
10 years. Leaving the age as is makes the instances too unique
to provide good results. That is a B4^ in the age category
would indicate an age from 40 to 49.
Considering the subset of the data for this investigation, we
show a portion of the resulting data set assuming a stationary
Markov model of order 2. In general, this will not be stationary. We have also realigned the age to start from 2 to 7. Table 2
shows again a snippet of the data used in this experiment
together with the rules defined by each instance from the
Markov model. In Table 2, the Group ID is determined by
the first two values in the instances. Again the B9^ indicates
missing values. We have added the symbol BS^ indicating the
start of a sequence. It may be left off, as it makes little difference, but it is useful for identification of rules. As we can see,
the rules consist of two parts. One is antecedent and the other
Table 3
Examples of nondeterministic rules derived from Table 2
11 → 3[8] 4[1] 9[1] 2[1]
25 → 1[2] 9[1]
26 → 9[1] 1[1]
12 → 0[5]
13 → 0[9]
27 → 1[1]
29 → 3[1]
31 → 1[2]
32 → 1[4] 2[1] 3[1] 0[1]
14 → 0[1] 2[1]
15 → 4[1]
19 → 0[3] 9[2] 1[1]
21 → 1[3] 9[3] 3[3] 2[3]
22 → 1[2]
23 → 1[1] 2[1]
24 → 3[2] 1[1] 2[1] 4[1] 9[2]
33 → 1[2] 2[1] 4[1] 0[4]
34 → 4[2] 1[4] 2[5] 3[4] 9[1]
35 → 2[3] 1[1] 3[1] 4[1]
is its consequent. Each rule has antecedent of order 2 that
uniquely identifies the consequent.
In order to replace the missing data element, we built a
table that includes the rules from Table 2 and modifying the
model to allow rules where the symbol on the right of the
arrow maybe non-deterministic. In order to do so, we use
the rules that match the left-hand side, and then using that
substituted value we see if there are rules that permit that
substitution. We also include the number of instances within
the B[]^ of that particular rule.
Table 3 provides again a few of the rules derived for the test
dataset. In Table 3, the rules are shown where for example a 1 1
can be followed by a 3 and there are 9 such instances, or a 4 and
there is only 1 instance, or B9^ representing an unknown and
there is one of them, and lastly a 2 which only occurs one time.
Clearly, in this situation, the first guess would be a 3 for the
missing data value. If a 3 replaces the B9,^ then the rule has to
be checked for consistency with the other rules. Also from
rules of Table 3, the rule 1 3 → 0 [9] should not be utilized
as a pattern divider if the objective is to segregate the rules into
classes that distinguish certain characteristics.
UNKNOWN
BENIGN
4 4→ 3 [9]
3 1→ 3 [5]
2 1→ 3 [25]
3 4→ 3 [2]
4 1→ 1 [85]
4 2→ 1 [65]
1 3→ 1 [6]
2 3→ 0 [5]
1 3→ 0 [57]
4 3→ 1 [2]
MALIGNANT
Fig. 1 Venn diagram for non-deterministic rules
5 9→ 1 [1]
9 3→ 0 [2]
1 1→ 9 [6]
1 9→ 9 [2]
2 1→ 9 [4]
4 2→ 9 [1]
Pers Ubiquit Comput
discriminatory rule may be the choice, but it may not help
determine the correct result. For the moment, we substitute
the 1 in place of the B?,^ so the original rule becomes 4
2 → 1. Checking this rule, we have the new rule 2 1 → ?:
3 4→9
3 4→2
[5]
3 4→3
[2]
4 2→1
[65]
3 4→9
[2]
2 1→1 ½3 9 ½3 3 ½3 2 ½3
4 2→9
[1]
Fig. 2 Tree structure identifying possible resolution paths
Figure 1 shows a Venn diagram constructed to show the
potential relationship between rules that distinguish subsets of
the data. The rules in Table 3 are placed without contextual
influence and only are shown as an illustration of the patterns.
It is possible to draw some conclusions on the data as
portrayed. First, the common intersection of all three sets defines rules that are non-discriminatory. That is, in the data set
used, these rules represented most of the instances. Next, if we
have a rule 3 4 → B9,^ then it can be replaced by values from
one of four rules and the most likely would be the rule 3 4 → 2
as there are 5 occurrences of this rule.
In order to check consistency of the substitution from the
triple 3 4 2, the rule 4 2 → ? needs to be checked. From our
test data the rule that matches this condition is
4 2→4 ½1 1 ½5
ð2Þ
We also could have chosen 3 4 → 1 [4] or 3 4 → 3 [4]. In
this case the rule 4 1 → 1 [85] which is essentially a nonFig. 3 Patterns that are left
unchanged by both techniques
ð3Þ
From the choices that we make in (2) and (3), we can go
back to the original input data and look to see what each new
rule defines in terms of the actual data. In the case of (3), the
unknown symbol can be replaced by any of the suitable candidates as they are all equally likely. In a larger set, this would
most likely not occur, but it is possible. At this point, we can
provide the choices and the ramifications of the choices and
then let the researcher makes the decision. These relationships
can also be shown in a tree structure as pictured in Fig. 2.
Figure 2 shows the most likely path by the bold lines. The 3
4 → 9 observed value shows four candidate substitutions.
Selecting the most likely, there are two further possibilities
with the 4 2 → 1 [4] being the most likely. So, the choice of
3 4 → 2 seems realistic. Figure 2 also shows that the choice of
3 4 → 2 with 5 occurrences and two of the other choices are
also good possibilities and so should likewise be considered.
4 Comparison
WEKA [8] is a collection of machine learning algorithms
written in Java, developed at the University of Waikato,
New Zealand. The algorithms can either be applied directly
to a dataset or called from Java code. WEKA allows the user
to mark missing values; once marked, a filter can be configured to replace the missing values with another value derived
Pers Ubiquit Comput
Fig. 6 Density with 15 data missing
Fig. 4 Shape with one missing value B9^
5 Conclusion and future work
from the values present and defined for use be that filter. One
alternative is to use the mean for those values or one of the
other filters available. WEKA contains tools for data preprocessing classification, regression, clustering, association rules,
and visualization.
We used WEKA because of its availability and richness of
tools. Using this system gives us a way to examine and compare the results of our Markov model with a well-developed
system. The following figures illustrate the counts of the original data, the modified counts first by our Markov model and
then by WEKA. Figure 3 shows that in both techniques, no
changes were made to data that was stable.
Figure 4 provides a view of the before and after counts as
modified by first the Markov model and then by WEKA. The
Markov model determined in the shape missing data, the best
value would be to substitute a 1, since this lessened the chance
of generating a false positive. There is only one instance
where the value in shape was unknown that is given the value
9. WEKA assigned a 2 to the missing data values.
Figure 5 displays the Margin data values and in this case,
there are four unknown instances. From this figure, we see
that the Markov model replaced all of the missing values with
a 1. WEKA reduced one of the B2s^ to a B1^ and added the
other 4 missing data elements with a B1.^
Figure 6 shows the density has 15 data values missing. The
Markov model assigned one of those values to B1,^ and one to
the B2^ parameter, with the remainder being assigned to the
B3.^ WEKA left the B1^ parameter alone and added all of the
missing data points the value of 3.
Now the question comes as to which model provides the best
set of answers, and like so much in life, the answer has to
reside in the Beye of the beholder.^ Neither model is significantly different and at least tells us that the non-stationary,
non-deterministic Markov model, if such perturbations allow
us to call it by this name, provides reasonable results. For
more complicated relationships, we believe our model will
continue to function in the range of other systems. But, we
think the advantage of this model lies in the ability to track the
results after a substation is made, or better still, the ability to
track proposed changes to reduce the instance of false positives. We believe that the model which considers both the past
and the future can provide the researcher with the most confidence in the actual predictions. We will further consider various aspects to extend our research, which will include overall
design of IoT-based healthcare systems [10, 11], how to collect data [12], how to maximize network availability [13] and
security [14].
References
1.
2.
3.
4.
5.
6.
7.
8.
Fig. 5 Margin with four missing data
Chiang M, Zhang T (2016) Fog and IoT: an overview of research
opportunities. IEEE Internet of Things J 3(3):865–964
JG Ibrahim, H Chu, and MH Chen (2013) Missing data in clinical
studies: issues and methods. Journal of Clinical Oncology, 30(26):
3297–3303. https://doi.org/10.1200/JCO.2011.38.7589
M Bland (2015) An introduction to medical statistics, 4th ed. (OUP)
Oxford University Press, Oxford, pp448
M Lichman (2013) UCI Machine Learning Repository, University
of California, School of Information and Computer Science. http://
archive.ics.uci.edu/ml/index.php
E Kampf, Data mining–handling missing values the database.
https://developerzen.com/data-mining-handling-missing-valuesthe-database-bd2241882e72
CJ Geyer. Introduction of Markov Chain Monte Carlo, Handbook
of Markov Chain Models. http://www.mcmchandbook.net/
HandbookChapter1.pdf
YC Yuan. Multiple imputation for missing data. Encyclopedia of
Measurement and Statistics http://www.ats.ucla.edu/stat/sas/
library/multipleimputation.pdf
Ali MG, Dawson SJ, Blows FM et al (2011) Comparison of
methods for handling missing data on immunohistochemical
Pers Ubiquit Comput
markers in survival analysis of breast cancer. Br J Cancer 104(4):
693–699. https://doi.org/10.1038/sj.bjc.6606078
9. G Chisci, H ElSawy, A Conti, MS Alouini, M Win (2017) On the
Scalability of Uncoordinated Multiple Access for the Internet of
Things, Proc. of 2017 International Symposium on Wireless
Communications Systems. IEEE, Bologna, pp 402–407
10. G. Ruggeri, O. Briante (2017) A Framework for IoT and E-Health
Systems Integration based on the Social Internet of Things
Paradigm, Proc. of 2017 International Symposium on Wireless
Communications Systems, pp. 426–431. https://doi.org/10.1109/
ISWCS.2017.8108152
11. D Masouros, I Bakolas, V Tsoutsouras, K Siozior, and D Soudris
(2017) From edge to cloud: Design and Implementation of a
Healthcare Internet of Things Infrastructure, Proc. of 27th
International Symposium on Power and Timing Modeling,
Optimization and Simulation, pp. 1–6. https://doi.org/10.1109/
PATMOS.2017.8106984
12. Qiu T, Qiao R, Han M, Sangaiah AK, Lee I (Nov. 2017) A lifetimeenhanced data collection scheme for the internet of things. IEEE
Commun Mag 55(11):132–137. https://doi.org/10.1109/MCOM.
2017.1700033
13. F Alsubaei, A Abuhussein, and S Shiva (2017) Security and Privacy
in the Internet of Medical Things: Taxonomy and Risk Assessment,
Proc. of 2017 I.E. conference on local computer networks workshop, pp 112–120
14. E Frank, MA Hall, and IH Witten (2016) The WEKA Workbench.
Online Appendix for Data Mining: Practical Machine Learning
Tools and Techniques, 4th ed. Morgan Kaufmann, Burlington