Computational Intelligence Methods For The Identification of Early Cardiac Autonomic Neuropathy
Computational Intelligence Methods For The Identification of Early Cardiac Autonomic Neuropathy
Computational Intelligence Methods For The Identification of Early Cardiac Autonomic Neuropathy
978-1-4673-6322-8/13/$31.00 2013
c IEEE 929
while the power is expressed as ms2 or when normalised is
expressed in normalised units (n.u.). The ratio of low to high
frequency components is also calculated as well as the total
power (ms2 ) [4].
Detrended Fluctuation Analysis (DFA) was introduced for
characterising the RR interval by [6]. To obtain DFA alpha1
(the short time correlation exponent) and alpha2 (the long time
Fig. 2. Normal R-R interval graph.
correlation exponent), the RR time series is first integrated
and the integrated series y(k) divided into segments of equal
milliseconds), and this can be plotted against time to produce length n. The trend in RR intervals in each of the segments
the graph shown in Fig. 2. This illustrates the natural variation is determined by fitting a least squares lines to the data. The
of RR interval that is indicative of a healthy cardiac system, y(k) is detrended by subtracting the local trend, yn (k), in each
as the heart rate is continuously varied to adapt to current segment. The average fluctuation of segment size n is indicated
needs of oxygenation and perfusion. It is the absence of such as F (n) in the equation:
a variation that can indicate cardiac disease.
N
HRV has been conventionally analysed with time- and 1
F (n) =
2
frequency-domain methods. Time domain methods are gener- (y(k) − yn (k)) (1)
N
ally variations upon moments of RR intervals; frequency based k=1
methods include Fourier analysis; and non-linear methods where N is the total length of the heart rate signal and
include fractal analysis and entropy measures. More recent k is the heart rate signal number. F (n) is repeated over
non-linear analysis has shown an increased sensitivity for segment sizes from 4 beats to 64 beats to provide a relationship
identifying risk of future morbidity and mortality in diverse between F (n) and segment size n. F (n) typically increases
patient groups. For example, an estimate of HRV using the with segment size. The slope of the line relating logF (n) to
standard deviation of RR intervals found that this is higher in log(n) provides the scaling exponent (α), with α1 based on
well-functioning hearts but can be decreased in coronary artery smaller segments (4-11 beats) and α2 on larger segments (12
disease, congestive heart failure and diabetic neuropathy [2]. to 64 beats). A value α = 0.5 indicates no correlations exist in
Although HRV is useful in disease detection, when only the the time series (corresponding to white noise). A value α = 1
standard deviation of RR intervals is used, it is no better than indicates the presence of fractal correlation properties (fractal
the average heart rate and in fact contains less information dimension property) and self-similarity. In biological signals
for risk prediction after acute myocardial infarction [3]. This α1 is typically between 0.8 to 1.2 for healthy subjects [7].
indicates that more advanced measures of HRV should be Sample Entropy is the negative natural logarithm of an
explored. Some of the measures derived from the RR interval estimate of the conditional probability that sub-series (epochs)
and used in this work are now discussed. of length m that match pointwise within a tolerance r also
The mean of RR interval is calculated before detrend- match at the next point [8]. D2 is the correlation dimension
ing (see below) and is expressed in milliseconds (ms). The based on fractal analysis.
Standard Deviation of the RR interval (SDNN) is likewise Other measures are derived from properties of the recur-
calculated before detrending and is also expressed in ms. rence plot, which is constructed by comparing all samples with
The square root of the mean squared difference of successive all others to create a square matrix. At each location of the
intervals is abbreviated to RMSSD (ms). The number of pairs matrix a point is placed if the two samples are sufficiently
of successive intervals that differ by more than 50ms divided close, otherwise the location is left blank. The recurrence rate
by the total number of intervals is known as pNN50. The HRV (REC) is the percentage of points drawn on the plot. The
triangular index is based on estimating the density distribution determinism (DET) is the percentage of points that form lines
of RR intervals. The area under the curve (integral) provides parallel to the main diagonal. The mean line length (Lmean)
the number of all intervals and is divided by the maximum is the mean of lines parallel to the main diagonal [9].
value of the distribution [4]. The triangular interpolation of the More recent analysis methods, including entropy measures,
interval histogram (TINN) is the estimated width of the density have shown an increased sensitivity for identifying risk of fu-
distribution. The Poincaré plot is a visual representation of the ture morbidity and mortality in cardiac patients. Renyi entropy
time series and is constructed by plotting each consecutive has shown significant differentiation of cardiovascular disease,
RR interval as a point where y = RRt and x = RRt−1 . An along with several other measures. In previous work, we have
ellipse is fitted to the resulting cloud of points and the major shown that Renyi Entropy can distinguish CAN from controls
and minor axes of the ellipse are estimated as SD1 and SD2, [1]. In this work we show that it is useful in distinguishing
both expressed as ms [5]. CAN even in the early stages of the disease.
Frequency domain methods divide the spectral distribution
into regions labelled as low or high frequency. Low frequency II. T HE M ULTI -S CALE R ENYI E NTROPY
is regarded as in the range 0.04-0.15 Hz while high frequency The multi-scale Renyi entropy was introduced and applied
is in the range 0.15-0.4 Hz. The peak value is expressed as Hz to physiologic time series by [10]. Renyi entropy Hα is a
930 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA)
generalisation of the Shannon entropy: subsets. A number of tests are performed where each subset
n in turn becomes the holdout set. In this way the classifier is
1 tested against all available data.
α
Hα (X) = log2 pi (2)
1−α i=1
A number of automated classifier algorithms are available
using the excellent Weka toolbox [14]. For this work we have
where pi is the probability that X = x and α is the order selected one algorithm from each major group available in
of the entropy measure. Varying α produces the multiscale Weka, to provide an overview of what performance can be
entropy. The probability can be estimated by comparing the expected. These are briefly discussed below.
sample i with all other samples by estimating the probability The Nave Bayes algorithm [15] estimates prior probabilities
density function of all other samples xj then estimating pi as by calculating simple frequencies of the occurrence of each
the probability given by this density function: feature value given each class, then returns a probability of
n
each class, given an unclassified set of features.
−dist2ij
pi = exp (3) Sequential Minimal Optimisation (SMO) is a classifier
j=0
2σ 2 based on the Support Vector Machine (SVM). The SVM builds
a set of exemplars that define the boundaries of the different
where σ is a parameter controlling the width of the density
classes. SMO builds on this using polynomial kernels [16].
function and dist() is a distance measure:
The Nearest Neighbours algorithm [17] simply stores sam-
π
ples. When an example is presented to the classifier, it looks
2
distij = (xi+k − xj+k ) (4) for the nearest match from the examples in the training set,
k=0 and labels the unknown example with the same class.
Here, π is the pattern length over which comparison occurs. The Decision Table algorithm divides the dataset into cells,
In this work the multi-scale Renyi Entropy was calculated from where each cell contains identical records. A record with
−5 < α < +5, where α = 0 is the Shannon entropy and α = 2 unknown class is assigned the majority, or most frequent, class
is the squared entropy. represented in the cell [18].
The Decision Tree Induction algorithm [19] uses the C4.5
III. C OMPUTATIONAL I NTELLIGENCE M ETHODS algorithm to form a tree by splitting each variable and mea-
This work relies on a number of data analytic methods, Ma- suring the information gain provided. The split with the most
chine Learning and Genetic Algorithms. Principal Component information gain is chosen, and then the process is repeated
Analysis (PCA) is a transformation from a multidimensional until the information gain provided is below a threshold.
space to a smaller dimensional space consisting of uncorrelated In order to apply these techniques to real problems, it
variables. It is commonly used for dimensionality reduction in is necessary to obtain a number of measures, or features,
data analysis and is often one of the first methods used in which can form the input vector u. These are the measures
exploratory data analysis. discussed above. It is to be expected that of the many measures
Automatic discrimination of different categories or classes available, some are better than other at discriminating between
(such as early CAN or normal) is a well-studied class of the classes. Methods for choosing the best feature set for
computational intelligence. Here the key is to determine some detecting cardiac dysfunction have been demonstrated by Teich
relationship between a set of input vectors that represent and co-workers [20]. However, in this study we are concerned
stimuli, and a corresponding set of values on a nominal scale with detecting early CAN.
that represent category or class. The relationship is obtained It is well known that using too many features can actually
by applying an algorithm to training samples that are 2-tuples degrade accuracy of the prediction, so optimising the accuracy
< u, z >, consisting of an input vector u and a class label of such methods involves a choice not only of classifier
z. The learned relationship can then be applied to instances algorithm, but also of the appropriate features. Kohavi suggests
of u not included in the training set, in order to discover the that the optimum feature subset will depend on the classifier
corresponding class label z [11]. A number of computational model chosen [21]. Therefore the subset may be considered
intelligence techniques including neural networks [12] have a parameter of the model. In addition, Kohavi suggests a
been shown to be very effective for solving such problems. Wrapper approach where the actual classifier algorithm is used
In evaluating the performance of any classifier, the accuracy to evaluate the features selected, and perform a search for the
is the most common measure used. In order to avoid bias in set of features that maximises classifier accuracy. The Weka
reporting this figure, it is necessary to report the accuracy on package also provides a Wrapper algorithm that can be used
data not seen by the classifier during training. This requires in conjunction with a number of search algorithms.
splitting the available data into two sets, the training set and The Genetic Algorithm (GA) is a search method suitable
the holdout set. The classifier is trained on the first dataset then for Wrapper evaluation for the feature subset, and in operation
evaluated by its performance on the holdout set. The holdout mimics some features of biological evolution. A GA randomly
set must be chosen carefully to prevent a source of bias. generates a population of candidate solutions then modifies
The most popular way of testing involves the cross validation those solutions in order to bring them closer to some desired
method [13] where the dataset is divided into a number of target [22], [23]. Modification usually involves combining two
2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA) 931
TABLE I
VARIABLES CALCULATED BY ANALYSIS OF THE RR INTERVAL FROM 140
PATIENTS .
IV. M ETHODOLOGY
932 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA)
TABLE III TABLE IV
VARIABLES CALCULATED BY ANALYSIS OF THE RR INTERVAL FROM 140 VARIABLES PROVIDED FROM A GA SEARCH . S TARRED VARIABLES WERE
PATIENTS . ALSO CONSIDERED SIGNIFICANT IN THE PCA, AS LISTED IN TABLE II.
TABLE V
C ORRECT CLASSIFICATION RESULTING FROM VARIABLES SELECTED BY
GA
2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA) 933
[3] S.Z. Abildstrom, B.T. Jensen, E. Agner, et al. Heart rate versus heart
rate variability in risk prediction after myocardial infarction. Journal of
Cardiovascular Electrophysiology Vol. 14, 2003, pp. 168-73.
[4] Task Force of the European Society of Cardiology, North American
Society of Pacing Electrophysiology. Heart Rate Variability: Standards of
Measurement, Physiological Interpretation, and Clinical Use. Circulation
Vol. 93, 1996, pp. 1043-1065.
[5] C.K. Karmakar, A.H. Khandoker, A. Voss and M. Palaniswami. Sensi-
tivity of temporal heart rate variability in Poincar plot to changes in
parasympathetic nervous system activity, BioMedical Engineering OnLine
Vol. 10, No. 17, 2011. Available at http://www.biomedical-engineering-
online.com/content/10/1/17, accessed Dec 2012.
[6] C.K. Peng, Shavlin, H.E. Stanley, and A.L. Goldberger. Quantification of
scaling exponents and crossover phenomena in nonstationary heartbeat
time series. Chaos Vol. 5, 1995. pp.82-7.
[7] A.C. Flynn, H.F. Jelinek and M.C. Smith. Heart rate variability analysis:
a useful assessment tool for diabetes associated cardiac dysfunction in
rural and remote areas. Aust J Rural Health Vol. 13, 2005, pp. 77-82.
[8] J.S. Richman, and J.R. Moorman. Physiological time series analysis using
approximate entropy and sample entropy. Am J Physiol Vol. 278, No. 6,
2000, pp. H2039-H2049.
[9] Cimponeriu, L. and Bezerianos, A., Simplified recurrence plots approach
on heart rate variability data. Proceedings of the conference on Computers
Fig. 6. Receiver Operating Curve for Early diagnosis found from applying in Cardiology, pp. 595 -598.
the Nearest Neighbour classifier to variables selected by GA. [10] M. Costa , A.L. Goldberger and C.K. Peng. Multiscale entropy analysis
of complex physiologic time series. Physical Review Letters Vol. 89, 2002.
[11] T.G. Dietterich and G. Bakiri, Solving Multiclass Learning Problems Via
the Naive Bayes classifier. This gave a correct classification Error-Correcting Output Codes, Journal of Artificial Intelligence Research,
on 64.5% of unseen data, with area under the ROC curve of Vol. 2, 1995, pp. 263-286.
[12] R.O Duda and P.E. Hart, Pattern Classification and Scene Analysis, John
0.713. Using a reduced feature set selected using a Wrapper Wiley and sons, New York, 1973.
algorithm and GA, the best performance was found for the [13] B. Efron, Estimating the error rate of a prediction rule: improvement on
Nearest Neighbour classifier. This gave a correct classification cross-validation, Journal of the American Statistical Association, Vol. 78,
No. 382, 1983, pp. 316-330.
on 68.1% of unseen data, with area under the ROC curve of [14] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools
0.734. Both approaches used a range of measures including and Techniques with Java Implementations, Morgan Kaufmann, 1999.
those based on the time and frequency domain and entropy [15] T. Bayes, An essay towards solving a problem in the doctrine of chances,
Philosophical Transactions of the Royal Society of London, Vol. 53, 1763,
measures. This is a very successful result, and shows the pp. 370-418.
possibility of detection of early CAN with all the benefits that [16] J. Platt. Fast Training of Support Vector Machines using Sequential
brings to life expectancy and quality of life for the patient. Minimal Optimization. In B. Schoelkopf, C. Burges, and A. Smola (eds.)
Advances in Kernel Methods - Support Vector Learning, MIT Press 1998.
The Wrapper results are more physiologically meaningful [17] R.A. Fisher, The use of multiple measurements in taxonomic problems,
than those of PCA, with short time correlations being iden- Annual Eugenics, Vol. 7, No. 2, 1936, pp. 179-188. Reprinted in Contri-
tified, indicative of parasympathetic failure, which is related butions to Mathematical statistics, Wiley, 1950.
[18] R. Kohavi, The Power of Decision Tables, in: Proceedings of the
to early CAN. Of course since the Ewing test is not 100% European Conference on Machine Learning, Lecture Notes in Artificial
accurate, we may be seeing a sub-sample and hence the Intelligence, Vol. 914, , Springer Verlag 1995, pp. 174-189.
inclusion of LFpower and alpha2 which are usually more [19] J.R. Quinlan, Induction of Decision Trees, Machine Learning, Vol. 1,
No. 1, 1986, pp. 81-106.
related to to late CAN and sympathetic activity. Alternatively [20] M.C. Teich, S.B. Lowen, B.M. Jost, K. Vibe-Rheymer, and C. Heneghan.
there may be already some long range (sympathetic) increase Heart Rate Variability: Measures and Models, in M. Akay (ed.), Nonlinear
not otherwise identified with different methods Biomedical Signal Processing, Vol. II, Dynamic Analysis and Modelling,
Many HRV measures have been suggested as diagnostic IEEE Press, New York, 2001.
[21] R. Kohavi and G. John, Wrappers for Feature Subset Selection. In
aids in the past, however it needs to be considered that the Artificial Intelligence Journal, special issue on relevance, Vol. 97, No.
pathology that is being investigated will be characterised by 1-2, 1996, pp. 273-324.
certain specific features in the ECG and therefore no single [22] J. Holland, Adaptation in Natural and Artificial Systems. University of
Michigan Press 1975.
HRV test should be expected to be ideal for all pathologies. [23] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Ma-
For the detection of early CAN the most successful measures chine Learning. 1989.
were derived using DFA and Renyi entropy. The multi-scale [24] H.F. Jelinek, C. Wilding and P. Tinley. An innovative multi-disciplinary
diabetes complications screening programme in a rural community: A
Renyi Entropy performs at a high level of accuracy and should description and preliminary results of the screening. Australian Journal of
be included as a neuroendocrine test for CAN. Primary Health Vol. 12, 2006, pp. 14-20.
[25] M. Javorka, Z. Trunkvalterova, I. Tonhajzerova, J. Javorkova, K. Javorka
R EFERENCES and M. Baumert. Short-term heart rate complexity is reduced in patients
with type 1 diabetes mellitus. Clin Neurophysiol Vol. 119, 2008, pp. 1071-
[1] H.F. Jelinek, M.P. Tarvainen, and D.J. Cornforth, Renyi Entropy in 81.
Identification of Cardiac Autonomic Neuropathy in Diabetes, Proceedings [26] A.H. Khandoker, H.F. Jelinek and M. Palaniswami. Identifying diabetic
of the Computers in Cardiology Conference 2013 (in press). patients with cardiac autonomic neuropathy by heart rate complexity
[2] R.E. Kleiger, J.P. Miller, J.T. Bigger Jr. and AJ. Moss Decreased heart analysis. Biomed Eng Online. Vol. 8, 2009.
rate variability and its association with increased mortality after acute
myocardial infarction. The American Journal of Cardiology Vol. 59, 1987,
pp. 256-62.
934 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA)