Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Computers in Biology and Medicine: Yong Xia, Naren Wulan, Kuanquan Wang, Henggui Zhang

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Computers in Biology and Medicine 93 (2018) 84–92

Contents lists available at ScienceDirect

Computers in Biology and Medicine


journal homepage: www.elsevier.com/locate/compbiomed

Detecting atrial fibrillation by deep convolutional neural networks


Yong Xia a, *, 1, Naren Wulan a, 1, Kuanquan Wang a, Henggui Zhang a, b
a
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
b
Biological Physics Group, Department of Physics and Astronomy, University of Manchester, Manchester, UK

A R T I C L E I N F O A B S T R A C T

Keywords: Background: Atrial fibrillation (AF) is the most common cardiac arrhythmia. The incidence of AF increases with
Atrial fibrillation age, causing high risks of stroke and increased morbidity and mortality. Efficient and accurate diagnosis of AF
Short-term Fourier transform based on the ECG is valuable in clinical settings and remains challenging. In this paper, we proposed a novel
Stationary wavelet transform method with high reliability and accuracy for AF detection via deep learning.
Deep convolutional neural networks Method: The short-term Fourier transform (STFT) and stationary wavelet transform (SWT) were used to analyze
ECG segments to obtain two-dimensional (2-D) matrix input suitable for deep convolutional neural networks.
Then, two different deep convolutional neural network models corresponding to STFT output and SWT output
were developed. Our new method did not require detection of P or R peaks, nor feature designs for classification,
in contrast to existing algorithms. Finally, the performances of the two models were evaluated and compared with
those of existing algorithms.
Results: Our proposed method demonstrated favorable performances on ECG segments as short as 5 s. The deep
convolutional neural network using input generated by STFT, presented a sensitivity of 98.34%, specificity of
98.24% and accuracy of 98.29%. For the deep convolutional neural network using input generated by SWT, a
sensitivity of 98.79%, specificity of 97.87% and accuracy of 98.63% was achieved.
Conclusion: The proposed method using deep convolutional neural networks shows high sensitivity, specificity and
accuracy, and, therefore, is a valuable tool for AF detection.

1. Introduction the episode duration: paroxysmal AF, persistent AF and permanent AF


[10]. A paroxysmal AF episode terminates spontaneously within seven
Atrial fibrillation (AF) is the most common cardiac arrhythmia, with a days. Whereas, a persistent AF episode lasts more than seven days and
prevalence of 1.5% to two percent in the developed countries [1]. It af- requires an external intervention, such as electrical or pharmacological
fects 2.2 million people in the United States and 4.5 million in the Eu- cardioversion [1]. Permanent AF is the case in which heart rate control
ropean Union [2]. The prevalence increases with age [3] to six to eight interventions are not pursued [11]. AF is adverse to blood flow dynamics
percent in people over 65 years of age [4,5]. In an aging society such as and may even lead to a stroke. It could be feasible to reduce the incidence
the US, it is estimated that the number of AF patients will increase of AF-induced complications by early and accurate diagnosis of AF.
2.5-fold in the next 50 years [6]. Though AF itself does not represent a
lethal condition, it can increase risks of morbidity or even mortality due ● Current AF diagnoses primarily rely on the presence of some typical
to AF-related complications, such as cardiac failure and atrial thrombosis symptoms (e.g., dyspnea, chest pain, dizziness) of the patients and
[7]. It has been reported that the presence of AF is related to a five-fold features of recorded electrocardiogram (ECG). However, it remains a
risk of stroke [8] and a three-fold risk of developing heart failure (HF) [3] challenge for early and accurate diagnosis of AF due to the following
– independent of other risk factors. Also, AF patients have a two-fold risk impeding factors: Symptoms of AF actually have poor correlation
of death over healthy people of the same age [1]. Moreover, AF may lead with AF occurrence [12,13] – AF-alike symptoms may not be pre-
to high hospitalization rates and extensive utilization of health resources, sented in some patients with AF and many paroxysmal AF episodes
imposing a great clinical and economic burden on society [9]. may be asymptomatic [14].
Clinically, AF can be classified into the following types depending on

* Corresponding author.
E-mail address: xiayong@hit.edu.cn (Y. Xia).
1
Joint first authors.

https://doi.org/10.1016/j.compbiomed.2017.12.007
Received 7 July 2017; Received in revised form 13 December 2017; Accepted 13 December 2017

0010-4825/© 2017 Elsevier Ltd. All rights reserved.


Y. Xia et al. Computers in Biology and Medicine 93 (2018) 84–92

● Clinicians need to have well-trained professional knowledge and future works.


skills to accurately interpret ECG. Additionally, it is rather time-
consuming to visually examine ECG signals. 2. Materials and methods

Therefore, it is valuable to develop algorithms for automatic detec- 2.1. Overview


tion of AF where a fast, accurate and reliable diagnosis is expected.
Currently, a wide variety of algorithms have been developed for auto- As illustrated in Fig. 1, an ECG signal is segmented by a certain length
matic AF detection. These algorithms mostly rely on two main charac- of time T to obtain a number of data segments of length T. Then each data
teristics manifested by AF on the ECG: (1) absence of P-waves (replaced segment was passed through a band-pass filter to remove noise. Next, a
by a series of low-amplitude oscillations called fibrillatory waves); (2) signal conversion is performed for each data segment. Note that in the
irregularity of R-R intervals. In the presence of noise, the AF detection signal conversion phase, only one method of signal conversion – either
algorithms, which rely solely on the absence of P-waves [15,16], perform short-term Fourier transform or stationary wavelet transform – can be
poorly since noise and drifting of the signal baseline may contaminate selected and performed. The purpose of signal conversion is to enable a
P-waves [17]. Many of these R-R interval-based methods [18–25] usually one-dimensional (1-D) ECG signal to be applied to a convolutional neural
require long segments of data (50–100 beats) to identify long AF epi- network that processes two-dimensional (2-D) data. Finally, the result of
sodes, and present limitations when dealing with very short AF episodes the signal conversion is taken as the input of the 2-D DCNN. The network
(less than 1 min) [17]. This deficiency may increase the time of AF automatically extracts the data features and performs classification.
detection, and may make a less accurate judgment for the AF boundary.
As a consequence, the performance of these existing algorithms strongly
2.2. Pre-processing
depends on the detection of P or R peak. If the relevant peaks are missed
or detected by mistake, their performance may significantly degrade.
The ECG signal is divided into segments for every 5 s. In order to
In a previous study, Asgari and others [26] proposed an approach that
remove baseline wander, muscle noise and power-line interference, an
eliminates the need for P-peak or R-Peak detection: They applied wavelet
elliptical band-pass filter with filter order of 10 and passband of
transform to extract peak-to-average power ratio and log-energy entropy
0.5–50 Hz is applied to each segment. The sequence filtered in the for-
as a feature vector for AF detection. Using such hand-crafted and fixed
ward direction is reversed and run back through the filter to obtain a
features, however, may not represent the optimal characteristics of the
zero-phase distortion.
signal. Furthermore, due to challenges of extracting reliable features,
these methods that rely on hand-crafted feature extraction may not be
broadly applied clinically, and the accuracy and efficiency may vary 2.3. Analysis of input representations for DCNNs
greatly when dealing with large data sets [27,28].
Aimed at addressing the deficiencies and drawbacks of existing AF In this section, we discuss how to organize ECG input representations
detection algorithms, we propose a novel method for automatic AF in order to make them suitable for DCNNs processing. DCNNs are often
detection based on deep convolutional neural networks (DCNNs). Deep used for image recognition, in which they organize the input as a 2-D
learning develops computational models consisting of multiple process- array. For color images, RGB (red, green, blue) values can be regarded as
ing layers which can learn abstract representations (called feature maps) three different 2-D feature maps. The input “image” for AF detection can
of data [29]. Deep-learning allows a machine to take raw data as input roughly be regarded as a spectrogram. DCNNs focus on local structures of
and to automatically discover the representations needed for detection or the input data. The local correlation in both the spectral and temporal
classification. For discovering intricate structure in data sets, deep dimensions of ECG signals makes the application of DCNN visual models
learning transforms the representation of one level to the representation possible. Therefore, the time-frequency transform is used for character-
of a higher and abstract level through a non-linear function. And then it izing ECG signals as DCNN input.
uses the backpropagation algorithm to change parameters which are
used to calculate the representation of each layer. With sufficient trans- 2.3.1. Short-term fourier transform
formations and changes, a machine can learn a complex function for The short-term Fourier transform (STFT) has been an effective
feature extraction and classification [30]. method to analyze frequency information along with time [32]. STFT
Over past years, deep learning has proved successful in speech functions as a trade-off between a time-based and a frequency-based
recognition, image recognition, object detection and many other do- representation by determining the frequency and phase of local sec-
mains such as drug discovery and genomics [30]. However, deep tions of a signal as it changes over time. In this way, STFT shows fre-
learning has not been used widely in ECG analysis and classification quencies contained in the signal as well as corresponding points of time.
because of a small training collection and specificity of ECGs. Although a STFT is based on the Fourier transform of short fragments which are
few studies used Convolutional Neural Networks for detection of ven- sampled by moving a window gðtÞ– commonly a Hamming window or
tricular ectopic beats and supraventricular ectopic beats [27], and Gaussian window centered around zero – to a long time signalf ðtÞ.
auto-encoder for ECG signal enhancement [31], the performance and Mathematically, the STFT is defined as:
application remain unsatisfactory when compared to image classification
Yðω; uÞ ¼ STFTff ðtÞg ¼ ∫ f ðtÞgðt  uÞejωt dt (1)
and speech recognition. Finding how to build an effective learning R
framework for ECGs is still a challenging problem.
The energy surface distribution of STFT (spectrogram), can be
To our knowledge, this is the first study using DCNNs for the purpose
computed by:
of AF detection. Compared with traditional AF detection algorithms, the
proposed method eliminates the detection of P-wave and R-wave and 
Eðω; uÞ ¼ Yðω; uÞj2 (2)
voids the need for any manual feature extraction. With proper training,
the convolutional layers of DCNNs can learn to extract features. More- In order to obtain a good resolution, the choice of window size is
over, it is important to remark that we use ECG signals of very short critical: For wide windows, the temporal resolution at high frequencies
duration (just 5 s) to perform the AF detection. becomes poor; for narrow windows, the temporal resolution is good but
The rest of this paper is organized as follows: Section 2 gives a frequency resolution is unsatisfactory. Indeed, it is impossible to find a
detailed description of the proposed methods. The results are presented time-frequency representation that is accurate in both time and fre-
in Section 3. Section 4 discusses the results and compares the results with quency. Consequently, the STFT converts a time domain signal into a 2-D
those of existing methods. Section 5 concludes the paper and outlines time-frequency representation and shows the frequency domain

85
Y. Xia et al. Computers in Biology and Medicine 93 (2018) 84–92

Fig. 1. Schematic overview of the proposed approach.

variation of the signal within the window function. Fig. 2 presents a tree-structure diagram of stationary wavelet trans-
form. In Fig. 2, XðzÞ is the zero-padded signal in z-dimension. GðzÞ and
2.3.2. Wavelet transform HðzÞ are high-pass and low-pass filters. Note that in each SWT decom-
Although STFT is a good way to analyze signals in the time-frequency position level, the impulse responses of the high pass and low pass filters
domain, its window is unchanged. An alternative approach to over- are up-sampled by a factor of two. The detail coefficients Dj ðnÞ and coarse
coming the problem in STFT is wavelet transform (WT). The advantage of coefficients Cj ðnÞ in the time domain can be recursively computed as:
the wavelet transform is multiresolution analysis, which means a window X  
automatically narrows when focusing on high frequencies and auto- Dj ðnÞ ¼ gðmÞDj1 n  2j1 m (4)
matically widens when focusing on low frequencies. Using wavelet m

transform, one can observe the characteristics of the signal emphasized at X  


different scales [33]. Cj ðnÞ ¼ hðmÞCj1 n  2j1 m (5)
The WT of a signal f ðtÞ is defined as: m

 
1 tb 2.3.3. Comparison between input representations
Tða; bÞ ¼ ∫ f ðtÞψ dt (3)
aR a At the end of the pre-processing step, we employed STFT and SWT,
respectively, on each 5-s signal segment (mentioned in Section 2.2) to
where a is the scale factor, b is the translation. The analyzing wavelet is ψ .
For small a, the wavelet has a good localization because it is contracted in
the time domain; so wavelet transform gives detailed information (or high
frequency) of the signal. For large a, the wavelet is expanded and the
wavelet transform gives a global view of the signal.
Traditionally, wavelet transform has been used for de-noising,
delineation, and compression of the signal. The continuous wavelet
transform (CWT) and discrete wavelet transform (DWT) are not time-
invariant, which means that temporal shifts in the signal will generate
different sets of wavelet transform coefficients [34]. To the contrary, the
stationary wavelet transform (SWT) is time-invariant at each decompo-
sition level. Since the SWT with J-levels on a signal requires the signal
length to be a multiple of 2J, the filtered data segment needs to be
zero-padded.
Fig. 2. Tree-structure of 3-level stationary wavelet transform.

86
Y. Xia et al. Computers in Biology and Medicine 93 (2018) 84–92

obtain two kinds of representations on filtered raw data. It is worth


noting that we use STFT to generate a spectrogram of each signal segment
but use SWT to generate 2J (J detail coefficients and J coarse co-
efficients) time series of each segment. Each coefficient time series has
the same time resolution as the original signal segment. The 2J time
series are organized in a 2-D matrix of which every row stands for one
coefficient time series. The 2-D matrix of coefficients time series can be
regarded as a grayscale “image.” Thus, these two types of representations
can both be used as 2-D inputs of DCNNs. We design two DCNN archi-
tectures, one in which the input is a spectrogram generated by STFT, and
another in which the input is a 2-D matrix of coefficients time series
generated by SWT. Fig. 3 shows the spectrogram input type and Fig. 4
shows the 2-D matrix of coefficients time series input type normalized to
[-1, 1]. As mentioned above, the 2-D matrix of coefficients time series is
the 2J time series where J means SWT with J-levels. In this study, J is set
to six, so Fig. 4 has a total of 12 rows (6 rows of detail coefficients and 6
rows of coarse coefficients). The 2-D matrix of coefficients time series can
be visualized graphically.

2.4. The basic process of DCNN architecture


Fig. 4. Graphical representation of a 5-s 2-D matrix of coefficients time series used as the
input of DCNNs.
DCNNs are a popular type of deep learning architecture and are pri-
marily composed of three types of layers: convolutional layers, pooling
layers and fully-connected layers. A convolutional layer applies a set of 3. Results
weights (called filter bank or kernel) to process small local parts of the
feature map from the raw input. Feature maps consist of many neurons In this section, we show the details of our two DCNN architectures,
called units. Each unit in the feature maps of the current convolutional including the number of convolutional layers and fully-connected layers,
layer is connected to the local areas in the feature maps of the previous the number of hidden units, the kernel size and pooling size. For
layer through the filter bank. The sum of the local weights is then passed simplicity, the DCNN architecture, which uses spectrogram as input, is
through a non-linear function such as a ReLU [35]. called DeepNet1. The DCNN architecture, which uses SWT coefficient
It is noteworthy that, in a convolutional layer, different feature time series as input, is called DeepNet2. Then, we compare the perfor-
maps use different filter banks, but all units of one feature map share mance of our proposed DeepNet1 and DeepNet2 with those of the
the same filter bank. The convolutional layer contributes to detections existing algorithms of AF detection.
of local connections of features from the previous layer but the pooling
layer merges similar features into one. Due to this operation, neigh-
3.1. Patient data
boring units have correlation with each other; reliable detection can be
achieved by generating a lower resolution feature map. The pooling
3.1.1. Database
layer can reduce the dimension of feature maps and the number of
The MIT-BIH Atrial fibrillation (MIT-BIH AFIB) data set [36], which is
parameters, and create invariance to translation and distortion. At the
the most popular database for AF detection, is used to evaluate the per-
end of the DCNNs, there are usually some fully-connected layers. Top
formance of the proposed algorithm. It is publicly accessible from
fully-connected layers perform the classification task and produce the
PhysioNet. This data set contains 23 annotated ECG recordings, each
final class vector. The back propagation algorithm used in the DCNN
about 10 h long with a sampling rate of 250 Hz and 12-bit resolution over
adjusts all the weights in all filter banks as it does in a conventional
a range of 10 mV. The data set includes 605 episodes: 12 episodes of
neural network.
junctional rhythm, 14 episodes of atrial flutter, 288 episodes of all other
rhythms and 291 episodes of atrial fibrillation. Each ECG recording
contains two leads called ECG1 and ECG2. In this study, we only choose
the ECG1 of each ECG recording for AF detection.

3.1.2. Data analysis


The original beat-to-beat annotated ECG signal is divided every 5 s
into segments by using a percentage threshold parameter P [19]: A data
segment is considered to be a genuine AF only when the percentage of
annotated AF beats in that data segment is greater than or equal to P. In
contrast, a non-AF data segment means that the percentage of annotated
AF beats in that data segment is less than P. According to the experi-
mental results obtained by Asgari and others [26], we set P to 50%. In our
study, we extracted a total of 162,536 5-s data segments, in which the
number of true AF data segments is 61,924, and the number of non-AF
data segments is 100,612. A non-AF set contains not only sinus
rhythm, but also other arrhythmias. The number of non-AF data seg-
ments is far greater than that of AF data segments; that is, the samples are
unbalanced. To address this problem, we randomly extracted the same
amount of non-AF data segments as the AF data segments. Then, we
divided the 123,848 samples into a training set and test set according to
Fig. 3. A 5-s spectrogram used as the input of DCNNs. the proportion of 9:1. The training set has 111,462 samples and the test

87
Y. Xia et al. Computers in Biology and Medicine 93 (2018) 84–92

set has 12,386 samples. A ten-fold cross-validation on 123,848 samples is


employed to train and test the DCNNs.
As introduced in Section 2.3.1, we use STFT with a Hamming window
of 128 samples on each 5-s data segment to generate a corresponding
spectrogram which was used as DeepNet1 input. We used SWT on each 5-
s data segment to generate a corresponding 2-D coefficient time series
matrix used as DeepNet2 input. We chose the Daubechies 5 wavelet as
the mother wavelet, which is an orthogonal wavelet that is similar to the
ECG waveform [37], to implement wavelet analysis. Considering the
sampling frequency of 250 Hz and the frequency range of atrial activities
(4–9 Hz) [38], we set the number of the wavelet transform decomposi-
tion level J to 6.
A 5-s ECG waveform prior to and during AF is presented in Fig. 5,
along with its detail wavelet coefficients of levels 4 to 6 (D4, D5, D6). Fig. 6. Feature maps and weight map of DeepNet1. A: Feature map; and B: Weight map.

Since the sampling frequency is 250 Hz, this signal contains 1250 sam-
pling points. From Fig. 5, it can be observed that the atrial activity can be
perceived in the wavelet domain, especially in the detail coefficients.

3.2. Deep features vs traditional features

Fig. 6(A) shows the feature maps extracted by the first convolutional Fig. 7. Feature maps and weight map of DeepNet2. A: Feature map; and B: Weight map.
layer of DeepNet1 with 16 convolutional kernels of size 5  5. Fig. 6(B)
shows the weight maps (called kernel matrices) of the first convolutional RGB image; so 32 convolutional kernels can only generate 32 not 96 wt
layer. A RGB spectrogram has three feature dimensions of red, blue, and maps as shown in Fig. 7(B).
green; each dimension is executed by 16 kernels to perform a convolution It is not hard to find out that traditional feature descriptors usually
operation so that a total of 48 wt maps are generated. It is worth noting rely on the absence of P-waves or R-R irregularities, or hand-crafted
that for RGB three channels, the actual size of a convolutional kernel is features. For example, many R-R interval-based methods compare the
K  K  3 (K means kernel size), so a convolutional kernel has three density histogram of R-R intervals. For traditional features, P and/or R
K  K weight maps. In DeepNet1, its first convolutional layer has 16 peaks need to be located; or researchers need to have very high expertise
kernels with kernel size of 5  5, so its first convolutional layer has 48 to manually select features for AF detection. As a result, their perfor-
5  5 wt maps. The corresponding 5  5 kernel on each channel is used to mances depend on the accuracy of peak detection or the reliability of the
convolute the image on each channel, and then the output of the three hand-crafted features. When missing peaks or erroneously detected
channels are added to get 16 feature maps instead of 48 feature maps. peaks, or hand-crafted features is not appropriate, their performance will
Fig. 7(A) shows the feature maps extracted by the first convolutional be compromised. In contrast, deep features, shown in Figs. 6 and 7, make
layer of DeepNet2 with 32 convolutional kernels of size 3  11. Different a comprehensive description of a signal segment from different angles
from the feature maps and weight maps in DeepNet1, those in DeepNet2 and multiple layers. This demonstrates that deep learning methods can
are more complex and present more detail of a 5-s signal segment. This obtain more detailed feature information from the signal. Therefore,
also demonstrates, on another perspective, that the input type based on deep features can produce more robust and stable performance than
SWT has a better explanation of the original signal in the time domain traditional methods. Moreover, deep features have diversity with
and frequency domain than the input form based on STFT. Another point different levels. From low layer to high layer, the features change from
to note is that the input type based on SWT is just a 2-D matrix of coef- concrete and delicate to abstract and visual, which is a process from local
ficient time series. It can be regarded as a grayscale “image” but not a to global.

3.3. Comparisons of DCNN architectures

In this section, we analyze the best DCNN architecture using SWT


coefficient time series as input and using spectrogram as input for AF
detection, respectively. Furthermore, the Caffe deep learning framework
[39] is used for performance evaluation.
Tables 1 and 2 show the different DCNN architectures which use SWT
coefficient time series as input and use spectrogram as input, respec-
tively. It can be observed in Tables 1 and 2 that, as the number of layers
increases, the test accuracy also increases. Experiment 3 and Experiment
5 in Table 1, and Experiment 1 and Experiment 2 in Table 2 did not

Table 1
The comparisons of different DCNN architectures which use SWT coefficient time series as
input.

# of # of convolutional # of fully connected Test


experiments layers layers accuracy

1 1 1 92.18%
2 1 2 95.21%
3 2 2 –
Fig. 5. An ECG signal prior to and during AF and its detail wavelet coefficients from levels
4 2 (without ReLU) 2 98.63%
4 to 6. (A) Original signal; (B) detail coefficient at level 4; (C) detail coefficient at level 5;
5 3 2 –
and (D) detail coefficient at level 6.

88
Y. Xia et al. Computers in Biology and Medicine 93 (2018) 84–92

Table 2 respectively. The optimization algorithm used in both DeepNet1 and


The comparisons of different DCNN architectures which use the spectrogram as input. DeepNet2 is the stochastic gradient descent (SGD). Furthermore, instead
# of # of convolutional # of fully-connected Test of large-size convolutional kernel, a multi-layer small-scale convolutional
experiments layers layers accuracy kernel is used in order to reduce the number of parameters and increase
1 1 1 – the non-linearity of the network. Before training, we calculate the
2 1 2 – average of all training samples, and subtract the average from the sam-
3 2 2 96.23% ples, which can improve the training speed and accuracy. In subsequent
4 3 2 98.29%
5 4 2 97.66%
tests, the average could also be subtracted from all test samples, so that
the average of the test samples does not need to be recalculated.

record the test accuracy since the DCNN did not work in these three
3.5. The architecture of DeepNet2
experiments. The training error did not decrease substantially; that is, the
parameters in the network are not valid for classification. Specially, the
Table 3 shows the architecture of DeepNet2. Since the architecture of
second column (the number of convolutional layers) in Tables 1 and 2
DeepNet2 is similar to the architecture of DeepNet1 as shown in Fig. 8,
represents a combination layer of a convolutional layer, a ReLU layer and
the architecture of DeepNet2 is illustrated using a table. The DeepNet2 is
a pooling layer, not just a convolutional layer. For example, the second
composed of two convolutional layers, two max-pooling layers, two ReLU
column of Experiment 1 in Table 1 is one, which means a convolutional
layers, one dropout layer, two fully-connected layers and one softmax
layer, a ReLU layer and a pooling layer. The second column of Experi-
layer. The first convolutional layer with kernel size of 3  11 has 32
ment 4 in Table 2 is three, which means three convolutional layers, three
hidden units followed by a ReLU layer. The pooling size of the first
ReLU layers and three pooling layers. Although the second column re-
pooling layer is 2  3. The second convolutional layer with a kernel size
sults of Experiment 3 and Experiment 4 in Table 1 are both two, Exper-
of 2  11 has 32 hidden units. The pooling size of the second pooling
iment 4 means that there is no ReLU layer between the second
layer is 2  3. The first fully-connected layer has 100 hidden units fol-
convolutional layer and the second pooling layer.
lowed by a ReLU layer and then a dropout layer. The second fully-
As a result, the best DCNN architecture using SWT coefficient time
connected layer has two hidden units. Lastly, there is a softmax layer
series as input is two combination layers (convolutional layer, ReLU layer
with two outputs. The initial learning rate is set to 0.01 and it is reduced
and pooling layer), where there is no ReLU layer between the second
by a factor of 0.1 every 5000 iterations. Momentum and weight decay
convolutional layer and the second pooling layer, and two fully-
rates are set to 0.9 and 0.0005, respectively.
connected layers. The best DCNN architecture using a spectrogram as
input is three combination layers and two fully-connected layers. It can
be observed in Tables 1 and 2 that, in the current best architecture, 3.6. Comparison of performance between DeepNet1 and DeepNet2
increasing the number of combination layers or fully-connected layers
will lead to poor test accuracy. Fig. 9 shows the detection results of the proposed method DeepNet1
on ECG signal segments where 1 means AF, and 0 means non-AF. Fig. 10
3.4. The architecture of DeepNet1 shows the detection results of the proposed method DeepNet2 on the
same ECG signal segments as in Fig. 9. The left side of Figs. 9 and 10
Through Section 3.3 and other experiments, we get the detailed
parameter settings for DeepNet1 and DeepNet2. Fig. 8 shows the archi- Table 3
The detailed architecture of DeepNet2.
tecture of DeepNet1. The DeepNet1 is composed of three convolutional
layers, three max-pooling layers, two ReLU layers, one drop-out layer, Layer # of output Kernel size/Pooling size
two fully-connected layers and one softmax layer. The first convolutional Conv1 32 3  11
layer with a kernel size of 5  5 has 16 hidden units followed by a ReLU ReLU1 32
layer. The pooling size of the first pooling layer is 3  3. The settings of Pool1 32 23
Conv2 32 2  11
the following convolutional, ReLU and pooling layers are the same as
Pool2 32 23
those for the first layer. The first fully-connected layer has 50 hidden Fc1 100
units followed by a ReLU layer and then a dropout layer. The second ReLU2 100
fully-connected layer has two hidden units. Lastly, there is a softmax Dropout1
layer with two outputs. The learning rate is fixed to 0.0008. Momentum Fc2 2
Softmax 2
and weight decay rates are set to 0.9 and 0.0005 according to [40],

Fig. 8. The DCNN architecture for the detection of atrial fibrillation using the spectrogram as input.

89
Y. Xia et al. Computers in Biology and Medicine 93 (2018) 84–92

Fig. 9. Detection results using the proposed method DeepNet1. (A) Original signal; (B) true annotation; (C) score (output of DCNNs classifier); (D) detection result; (E) Original signal; (F)
true annotation; (G) score (output of DCNNs classifier); (H) detection result.

Fig. 10. Detection results using the proposed method DeepNet2. (A) Original signal; (B) true annotation; (C) score (output of DCNNs classifier); (D) detection result; (E) Original signal; (F)
true annotation; (G) score (output of DCNNs classifier); (H) detection result.

shows the detection result of the same signal segment of record 04043
containing an AF episode. From Figs. 9(D) and Fig. 10(D), we can observe
that there is an erroneous detection around 35 s–50 s (Fig. 9(D)), but not
in (Fig. 10(D)). The boundary of the AF segment, however, is accurately
positioned in Fig. 9 (D). Whereas it is advanced by approximately 5 s in
Fig. 10(D). The right side of Figs. 9 and 10 shows the detection results of
the same signal segment of record 04936 of a non-AF episode. From
Figs. 9(H) and 10(H), we can observe that the performance is satisfactory
in detecting the non-AF segment. The detection score is slightly higher in
Fig. 10 (G) during about 10 s–25 s than in Fig. 9(G).
Based on a ten-fold cross-validation evaluation, the DeepNet1 pre-
sented a sensitivity of 98.34  0.13%, specificity of 98.24  0.11% and
accuracy of 98.29  0.17% on average. For DeepNet2, a sensitivity of Fig. 11. Test accuracy of the proposed method DeepNet1 and DeepNet2 with respect to
98.79  0.12%, specificity of 97.87  0.15% and accuracy of iterations.
98.63  0.17% on average was achieved. Fig. 11 shows test accuracy of
the proposed method DeepNet1 and DeepNet2 along the iteration. It can 3.7. Training time
be observed that the accuracy of their classification is basically the same
and very close to one. The experiments were performed on a computer with 2 CPUs at
2.1 GHz, 2 NVIDIA Tesla K40c GPU and 32-Gb memory. We ran all the

90
Y. Xia et al. Computers in Biology and Medicine 93 (2018) 84–92

proposed Deep models over our highly efficient GPU using the Caffe deep Table 4
learning framework [39]. The total training time of DeepNet1 was about Comparison of the performances of AF detection algorithms that have been validated on
MIT-BIH AFIB database.
40 min for 30,000 iterations; so the average time for 1 BP iteration was
about 83 msec. The total training time of DeepNet2 was about 8 min for Algorithm Sensitivity Specificity Accuracy Methodology
(%) (%) (%)
30,000 iterations; so the average time for 1 BP iteration was about
16 msec. Slocum et al. 62.80 77.46 – P-wave absence
[15]
Tateno et al. 94.40 97.20 – RR interval
4. Discussion [18] irregularity
Dash et al. 94.40 95.10 – RR interval
We performed a ten-fold cross-validation on the data obtained by [19] irregularity
SWT and STFT to train and test the deep convolutional neural networks. Huang et al. 96.10 98.10 – RR interval
[20] irregularity
Classification performance was measured using accuracy (Acc), sensi-
Lee et al. [22] 98.20 97.70 – RR interval
tivity (Se) and specificity (Sp). Table 4 summarizes the performance of irregularity
our methods and the existing algorithms validated on the MIT-BIH AF Babaeizadeh 92.00 95.50 – RR interval
database. It is apparent that the specificity of our proposed method et al. [24] irregularity þ P-
DeepNet1, with 98.24%, is the highest of all algorithms listed in Table 4. wave absence
Jiang et al. 98.20 97.50 – RR interval
The specificity of our proposed method DeepNet2 is just 0.13% lower [25] irregularity þ P-
than that of Huang [20]. The sensitivity of our proposed method Deep- wave absence
Net1 and DeepNet2 are 98.34% and 98.79%, respectively, which are Asgari et al. 97.00 97.10 – Peak-to-
both higher than any other algorithms. The accuracy of our proposed [26] averagepower
ration þ log-energy
method DeepNet1 and DeepNet2 is 98.29%, 98.63%, respectively.
entropy
For STFT, some researchers used a grayscale spectrogram instead of a DeepNet1 98.34 98.24 98.29 STFT þ DCNNs
RGB spectrogram. Here, we also evaluate DeepNet1 by using a grayscale (RGB)
image of a spectrogram, and the result is also provided in Table 4 for DeepNet1 98.60 97.17 97.74 STFT þ DCNNs
comparison. It can be observed that the performance on sensitivity with (grayscale)
DeepNet2 98.79 97.87 98.63 SWT þ DCNNs
grayscale spectrogram is a little better than that of a RGB spectrogram.
However, in general, a RGB spectrogram outweighs a grayscale spec- Bold refers proposed methods in this paper.
trogram for AF detection. As mentioned in Section 3.2 (regarding the
feature maps and weight maps of STFT þ DCNN), a RGB spectrogram has
three feature dimensions of red, blue, green, and if the first convolutional DCNNs will extract the intrinsic characteristics of the signal and use
layer has 16 kernels, then 16 feature maps and 48 wt maps (called weight these learned features to classify.
matrices) will be produced. However, a grayscale spectrogram has only ● Furthermore, some existing methods such as the Jiang [25] method,
one channel, if the first convolutional layer has 16 kernels, then only 16 need a long data segment (50 beats, about 40 s) to ensure a high ac-
feature maps and 16 wt maps will be produced. It can be seen that the curacy on AF detection which will cause some short AF episodes to be
weight matrix from RGB spectrogram is three times larger than the undetectable. But our proposed methods can achieve a good accuracy
weight matrix from a grayscale spectrogram. Furthermore, as the number on an ECG of 5 s.
of layers increase, the difference in the weight matrix will increase
greatly. As for the convolutional neural network, the weight matrix can 5. Conclusion
be considered to describe and analyze the data from different aspects or
angles. So, in a way, the increase in the weight matrix is beneficial for the In this paper, we convert the one-dimensional ECG signal into two-
network to capture the intrinsic characteristics of data and, accordingly, dimensional form by short-time Fourier transform and stationary
can improve the accuracy of classification. wavelet transform respectively, to perform the detection of atrial fibril-
Overall, our proposed methods, DeepNet1 and DeepNet2, have a lation using a deep convolution neural network. The proposed method
better performance than Slocum [15], Tateno [18], Dash [19], Huang does not require the detection of P or R peak and the extraction of
[20], Lee [21], Babaeizadeh [24], Jiang [25] and Asgari [26] algorithms. handcrafted manual feature. The results of the experiments, which are
Specifically, the methods we propose have the following advantages for performed over the MIT-BIH Atrial fibrillation (MIT-BIH AFIB) database,
detecting AF in clinical applications: show that our DCNNs with the input form based on STFT has sensitivity
of 98.34%, specificity of 98.24%, accuracy of 98.29%, and DCNNs with
● Tateno [18], Dash [19], Huang [20] and Lee [22] methods need to the input form based on SWT has sensitivity of 98.79%, specificity of
detect R-R intervals. Slocum [15] method needs to detect P-wave. 97.87%, accuracy of 98.63%, both outperform the majority of the
Babaeizadeh [24] and Jiang [25] methods need to detect both R-R existing algorithms. Furthermore, the proposed methods achieve satis-
intervals and P-wave. Their performances depend on the accuracy of factory performance for data segment as short as 5 s. Therefore, the
peak detection and when missing peaks or erroneously detected proposed approach is a fast, accurate, efficient method for atrial fibril-
peaks, their performance may be poor. In contrast, our proposed lation detection.
methods do not rely on peak detection.
● Although the Asgari [26] method does not need to detect R-R in- Conflicts of interest
tervals or P-wave, they manually selected the peak-to-average power
ratio and log-energy entropy of the signal as features of AF detection. None.
However, the features used in Asgari [26] method may not represent
the optimal characteristics of the signal. Moreover, for the AF Acknowledgment
detection algorithms using hand-crafted features, the detection ac-
curacy is largely dependent on the quality of the feature selection. In This work was financially supported by Shandong Province Natural
contrast, our proposed methods, DeepNet1 and DeepNet2, do not Science Foundation (ZR2015FM028) and Shandong Province Science
have that limitation. We simply use the original signal and transform and Technology Development Plan (2014GSF118152).
it to a 2-D representation with STFT and SWT respectively. Then the

91
Y. Xia et al. Computers in Biology and Medicine 93 (2018) 84–92

References [19] S. Dash, K.H. Chon, S. Lu, E.A. Raeder, Automatic real time detection of atrial
fibrillation, Ann. Biomed. Eng. (2009) 1701–1709.
[20] C. Huang, S. Ye, H. Chen, D. Li, F. He, Y. Tu, A novel method for detection of the
[1] V. Fuster, L.E. Ryden, D.S. Cannom, H.J. Crijns, A.B. Curtis, K.A. Ellenbogen,
transition between atrial fibrillation and sinus rhythm, IEEE Trans. Biomed. Eng.
J.L. Halperin, G.N. Kay, J.-Y. Le Huezey, J.E. Lowe, S.B. Olsson, E.N. Prystowsky,
(2011) 1113–1119.
J.L. Tamargo, L.S. Wann, 2011 ACCF/AHA/HRS focused updates incorporated into
[21] D.E. Lake, J.R. Moorman, Accurate estimation of entropy in very short physiological
the ACC/AHA/ESC 2006 guidelines for the management of patients with atrial
time series: the problem of atrial fibrillation detection in implanted ventricular
fibrillation, J. Am. Coll. Cardiol. (2011) 101–198.
devices, AJP Hear. Circ. Physiol (2011) 319–325.
[2] M. Lainscak, N. Dagres, G.S. Filippatos, S.D. Anker, D.T. Kremastinos, Atrial
[22] J. Lee, B.A. Reyes, D.D. McManus, O. Mathias, K.H. Chon, Atrial fibrillation
fibrillation in chronic non-cardiac disease: where do we stand? Int. J. Cardiol.
detection using an iphone 4S, IEEE Trans. Biomed. Eng. (2013) 203–206.
(2008) 311–315.
[23] X. Zhou, H. Ding, B. Ung, E. Pickwell-MacPherson, Y. Zhang, Automatic online
[3] A. Camm, G. Lip, R. De Caterina, I. Savelieva, D. Atar, S.H. Hohnloser, 2012 focused
detection of atrial fibrillation based on symbolic dynamics and Shannon entropy,
update of the ESC Guidelines for the management of atrial fibrillation, Eur. Heart J.
Biomed. Eng. Online (2014) 18.
(2012) 2719–2747.
[24] S. Babaeizadeh, R.E. Gregg, E.D. Helfenbein, J.M. Lindauer, S.H. Zhou,
[4] S. Colilla, A. Crow, W. Petkun, D.E. Singer, T. Simon, X. Liu, Estimates of current
Improvements in atrial fibrillation detection for real-time monitoring,
and future incidence and prevalence of atrial fibrillation in the U.S. adult
J. Electrocardiol. (2009) 522–526.
population, Am. J. Cardiol. (2013) 1142–1147.
[25] K. Jiang, C. Huang, S. Ye, H. Chen, High accuracy in automatic detection of atrial
[5] B.P. Krijthe, A. Kunst, E.J. Benjamin, G.Y.H. Lip, O.H. Franco, A. Hofman,
fibrillation for Holter monitoring, J. Zhejiang Univ. - Sci. B. (2012) 751–756.
J.C.M. Witteman, B.H. Stricker, J. Heeringa, Projections on the number of
[26] S. Asgari, A. Mehrnia, M. Moussavi, Automatic detection of atrial fibrillation using
individuals with atrial fibrillation in the European Union, from 2000 to 2060, Eur.
stationary wavelet transform and support vector machine, Comput. Biol. Med.
Heart J. (2013) 2746–2751.
(2015) 132–142.
[6] A.M. Gillis, A.D. Krahn, A.C. Skanes, S. Nattel, Management of atrial fibrillation in
[27] S.C. Lee, Using a translation-invariant neural network to diagnose heart arrhythmia,
the year 2033: new concepts, tools, and applications leading to personalized
in: images Twenty-First Century, Proc. Annu. Int. Eng. Med. Biol. Soc (1996)
medicine, Can. J. Cardiol. (2013) 1141–1146.
2025–2026.
[7] C.D. Furberg, B.M. Psaty, T.A. Manolio, J.M. Gardin, V.E. Smith, P.M. Rautaharju,
[28] P. De Chazal, R.B. Reilly, A patient-adapting heartbeat classifier using ECG
Prevalence of atrial fibrillation in elderly subjects (the Cardiovascular Health
morphology and heartbeat interval features, IEEE Trans. Biomed. Eng. (2006)
Study), Am. J. Cardiol. (1994) 236–241.
2535–2543.
[8] P. a Wolf, R.D. Abbott, W.B. Kannel, Atrial fibrillation as an independent risk factor
[29] H. Goh, N. Thome, M. Cord, J.H. Lim, Learning deep hierarchical visual feature
for stroke: the Framingham Study, Stroke (1991) 983–988.
coding, IEEE Trans. Neural Networks Learn. Syst (2014) 2212–2225.
[9] C. a Sanoski, Clinical, economic, and quality of life impact of atrial fibrillation,
[30] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Deep Learn (2015) 436–444.
J. Manag. Care Pharm. (2009) 4–9.
[31] P. Xiong, H. Wang, M. Liu, S. Zhou, Z. Hou, X. Liu, ECG signal enhancement based
[10] V. Markides, R.J. Schilling, Atrial fibrillation: classification, pathophysiology,
on improved denoising auto-encoder, Eng. Appl. Artif. Intell. (2016) 194–202.
mechanisms and drug treatment, Heart (2003) 939–943.
[32] L. Cohen, Time-frequency analysis: theory and applications, J. Acoust. Soc. Am. 134
[11] C.T. January, L.S. Wann, J.S. Alpert, H. Calkins, J.E. Cigarroa, J.C.J. Cleveland,
(5) (1995), 4002–4002.
J.B. Conti, P.T. Ellinor, M.D. Ezekowitz, M.E. Field, K.T. Murray, R.L. Sacco,
[33] P.R. Gomes, F.O. Soares, J.H. Correia, C.S. Lima, ECG data-acquisition and
W.G. Stevenson, P.J. Tchou, C.M. Tracy, C.W. Yancy, 2014 AHA/ACC/HRS
classification system by using Wavelet-domain Hidden Markov Models, in: 2010
guideline for the management of patients with atrial fibrillation: executive
Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. EMBC’10, 2010, pp. 4670–4673.
summary, J. Am. Coll. Cardiol. (2014) 2246–2280.
[34] A. Salmanpour, L.J. Brown, J.K. Shoemaker, Performance analysis of stationary and
[12] Q. Xiong, M. Proietti, K. Senoo, G.Y.H. Lip, Asymptomatic versus symptomatic atrial
discrete wavelet transform for action potential detection from sympathetic nerve
fibrillation: a systematic review of age/gender differences and cardiovascular
recordings in humans, in: 2008 30th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc,
outcomes, Int. J. Cardiol. (2015) 172–177.
2008, pp. 2932–2935.
[13] S.A. Strickberger, J. Ip, S. Saksena, K. Curry, T.D. Bahnson, P.D. Ziegler,
[35] I.J. Goodfellow, D. Wardefarley, M. Mirza, A. Courville, Y. Bengio, Maxout
Relationship between atrial tachyarrhythmias and symptoms, Heart Rhythm (2005)
networks, Comput. Sci (2013) 1319–1327.
125–131.
[36] A.L. Goldberger, L.A.N. Amaral, L. Glass, J.M. Hausdorff, P.C. Ivanov, R.G. Mark,
[14] C.W. Israel, G. Gr€onefeld, J.R. Ehrlich, Y.G. Li, S.H. Hohnloser, Long-term risk of
J.E. Mietus, G.B. Moody, C.-K. Peng, H.E. Stanley, PhysioBank, PhysioToolkit, and
recurrent atrial fibrillation as documented by an implantable monitoring device:
PhysioNet, Circulation (2000) 215–220.
implications for optimal patient care, J. Am. Coll. Cardiol. (2004) 47–52.
[37] B. Weng, J.J. Wang, F. Michaud, M. Blanco-Velasco, Atrial fibrillation detection
[15] J. Slocum, A. Sahakian, S. Swiryn, Diagnosis of atrial fibrillation from surface
using stationary wavelet transform analysis, Conf Proc IEEE Eng Med Biol Soc.
electrocardiograms based on computer-detected atrial activity, J. Electrocardiol.
(2008) 1128–1131.
(1992) 1–8.
[38] M. Stridh, L. S€ornmo, C.J. Meurling, S.B. Olsson, Sequential characterization of
[16] S. Ladavich, B. Ghoraani, Rate-independent detection of atrial fibrillation by
atrial tachyarrhythmias based on ECG time-frequency analysis, IEEE Trans. Biomed.
statistical modeling of atrial activity, Biomed. Signal Process Contr. (2015)
Eng. (2004) 100–114.
274–281.
[39] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama,
[17] N. Larburu, T. Lopetegi, I. Romero, Comparative study of algorithms for Atrial
T. Darrell, Caffe: convolutional architecture for fast feature embedding, ACM Int.
Fibrillation detection, 2011 Comput. Cardiol (2011) 265–268.
Conf. Multimed (2014) 675–678.
[18] K. Tateno, L. Glass, Automatic detection of atrial fibrillation using the coefficient of
[40] A. KrizhKrizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep
variation and density histograms of RR and ΔRR intervals, Med. Biol. Eng. Comput.
convolutional neural networks, Adv. Neural Inf. Process. Syst. (2012) 1–9.
(2001) 664–671.

92

You might also like