Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Efficient Deep Learning Based Hybrid Model To Detect Obstructive Sleep Apnea

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

sensors

Article
Efficient Deep Learning Based Hybrid Model to Detect
Obstructive Sleep Apnea
Prashant Hemrajani 1 , Vijaypal Singh Dhaka 1 , Geeta Rani 1 , Praveen Shukla 1 and Durga Prasad Bavirisetti 2, *

1 Computer and Communication Engineering, Manipal University Jaipur, Jaipur 303007, Rajasthan, India
2 Department of Computer Science, Norwegian University of Science and Technology,
7034 Trondheim, Norway
* Correspondence: durga.bavirisetti@ntnu.no

Abstract: An increasing number of patients and a lack of awareness about obstructive sleep apnea is
a point of concern for the healthcare industry. Polysomnography is recommended by health experts
to detect obstructive sleep apnea. The patient is paired up with devices that track patterns and
activities during their sleep. Polysomnography, being a complex and expensive process, cannot be
adopted by the majority of patients. Therefore, an alternative is required. The researchers devised
various machine learning algorithms using single lead signals such as electrocardiogram, oxygen
saturation, etc., for the detection of obstructive sleep apnea. These methods have low accuracy, less
reliability, and high computation time. Thus, the authors introduced two different paradigms for the
detection of obstructive sleep apnea. The first is MobileNet V1, and the other is the convergence of
MobileNet V1 with two separate recurrent neural networks, Long-Short Term Memory and Gated
Recurrent Unit. They evaluate the efficacy of their proposed method using authentic medical cases
from the PhysioNet Apnea-Electrocardiogram database. The model MobileNet V1 achieves an
accuracy of 89.5%, a convergence of MobileNet V1 with LSTM achieves an accuracy of 90%, and a
convergence of MobileNet V1 with GRU achieves an accuracy of 90.29%. The obtained results prove
the supremacy of the proposed approach in comparison to the state-of-the-art methods. To showcase
the implementation of devised methods in a real-life scenario, the authors design a wearable device
that monitors ECG signals and classifies them into apnea and normal. The device employs a security
mechanism to transmit the ECG signals securely over the cloud with the consent of patients.
Citation: Hemrajani, P.; Dhaka, V.S.;
Rani, G.; Shukla, P.; Bavirisetti, D.P.
Keywords: MobileNet V1; Long Short Term Memory; obstructive sleep apnea; single-lead ECG;
Efficient Deep Learning Based
Gated Recurrent Unit
Hybrid Model to Detect Obstructive
Sleep Apnea. Sensors 2023, 23, 4692.
https://doi.org/10.3390/s23104692

Academic Editors: Jérôme Rossignol 1. Introduction


and Nebojsa Bacanin In humans, sleeping is a universal functionality, comprising thirty-three percent of
Received: 15 February 2023
their life. The outcome of sleep deficiency results in dysfunction in most body functions.
Revised: 23 April 2023 A sleep disorder may present as a problem associated with aberrant sleep deficiency [1,2].
Accepted: 25 April 2023 Sleep apnea is a severe problem brought on by sleep disturbances. According to The
Published: 12 May 2023 American Academy of Sleep Medicine (AASM), it is a sleep disorder in which patients
experience breathing pauses while they are asleep [3]. The three different types of sleep
apnea are central sleep apnea (CSA), obstructive sleep apnea (OSA), and complex sleep
apnea (CSA) [4]. Airflow during sleep is said to stop for 10 s due to OSA, which is
Copyright: © 2023 by the authors. caused by a blockage in the upper airway [5,6]. The brain’s respiratory center regulates
Licensee MDPI, Basel, Switzerland. breathing and creates airflow disruptions while sleeping, which causes central sleep apnea.
This article is an open access article The phrase “complex sleep apnea” refers to a condition in which a patient suffers from
distributed under the terms and
obstructive and central sleep apnea [2]. An obstruction in the upper respiratory tract causes
conditions of the Creative Commons
OSA, a common problem for humans. Drowsiness upon awakening, daytime tiredness,
Attribution (CC BY) license (https://
and snoring indicates OSA [7,8]. According to [9], a study was conducted on Indians
creativecommons.org/licenses/by/
in 2015, where nearly 93% of people found insomnolence. The prevalence of OSA has
4.0/).

Sensors 2023, 23, 4692. https://doi.org/10.3390/s23104692 https://www.mdpi.com/journal/sensors


Sensors 2023, 23, 4692 2 of 20

been observed higher in the country’s western region. According to the report, OSA
varies between 4.4% and 19.7% for males and 2.5% to 7.4% for females [10]. OSA is also
becoming more well-recognized as a primary cause of high blood pressure, cerebrovascular
disease, and cardiovascular disease [6,11]. The Apnea-Hypopnea Index (AHI) is a clinical
assessment of OSA severity that counts apnea and hypopnea occurrences during a specific
time while sleeping [7]. Polysomnography (PSG) is the “gold standard” for identifying
OSA in clinical terms [8,11]. To detect sleep problems, patients must wear several wires
and sensors on their bodies for one or more nights and rely on specialized laboratories and
people. PSG is an expensive and time-consuming procedure that is tough to implement.
As a result, researchers have proposed several single-lead signals-based OSA detection
methods because single signals such as the ECG can pinpoint the occurrence of OSA in
diagnosed patients due to a decrease in blood oxygen levels, forcing the cardiovascular
system to work harder to maintain an adequate oxygen level throughout the body, forcing
the cardiovascular system to work harder to maintain proper oxygen level throughout
the body. In an earlier study, the majority of researchers relied on standard machine
learning algorithms and CNN models, both of which lacked the ability to extract features
under varying sizes and channels [8,9,12] To improve the accuracy of the classification of
obstructive sleep apnea, the proposed research seeks to optimize the features extracted
from the convolutional neural network derived from the electrocardiogram (ECG) signal
and associate it with a recurrent neural network (LSTM). This is the novel aspect of the
research. Our key contributions are as follows:
1. Two OSA detection models are proposed, first using MobileNet V1, which offers
improved accuracy over the state-of-the-art on the same data set.
2. The second OSA detection model is proposed by integrating MobileNet V1 with
two different RNNs (LSTM and GRU) used separately to provide two distinct deep-
learning models. This model offers the highest accuracy, specificity, and sensitivity
over the state-of-the-art on the same data set.
3. A secured wearable device to detect and classify ECG signals of the patient being
apneic or not. A security mechanism assures the data received from the sensors is left
open to any mishandling.
The rest of this paper is laid out as follows. The related works will be described in
Section 2. The preprocessing methods, training parameters, and proposed algorithms are
all described in Section 3. In Section 4, the dataset and results are shown. Section 5 contains
the conclusion.

2. Related Works
Most single-lead OSA (Obstructive Sleep Apnea) detection research points to the
use of pulse oximetry and ECG-based signals. The extraction of information (e.g., fre-
quency and time domain, and other variables), and finding patterns and trends are used
to determine and predict OSA occurrence accurately. A cumulative study of previous
work conducted on the Physionet Apnea-ECG database [13], using Lead-II ECG signals
(35 withheld + 35 released) is shown in Table 1.
Changyue Song et al. has used the Discriminative Hidden Markov Model (HMM)
to detect OSA from ECG signals [14]. However, this was unable to point to the severity
level of the OSA episode. The result is limited to a Boolean value with failure to elaborate
thereof. Another study based on DNN and HMM was performed by Kunyang Pan et al.
using a single-lead ECG signal [7]. Different classifiers increased the method’s performance
[Support vector machines (SVM), ANN, HMM]. The decision fusion algorithm and Newton
method were also used [11]. In contrast, the disadvantage of this research is the absence of
classification and illness detection.
For extricating multiple features from the RR intervals (RRIS) sequence, Qi Shen et al. [8]
used a methodology based on the multiscale dilation attention 1-D convolutional neural
network model, a multiscale feature extraction algorithm, and classifiers with weighted
loss and time-dependence (WLTD) [15]. Single-lead ECG signals were employed, which
Sensors 2023, 23, 4692 3 of 20

were helpful since they are more useful in wearable devices than in medical monitoring
systems [8]. Unfortunately, the network model struggled to automatically extract charac-
teristics from the original ECG data, necessitating substantial manual intervention. Deep
Learning is employed for classification while Long Short-Term Memory (LSTM) recurrent
networks are applied for feature extraction. The 2D-CNN model and the LSTM were used
to recover the spatial and temporal properties [16].
A single-lead ECG model, proposed by Kaicheng et al., will be used for unsuper-
vised feature learning to detect sleep apnea. The frequential stacked sparse auto-encoder
(FS-SAE) and the time-dependent cost-sensitive (TDCS) classification served as the model’s
foundation, and the Hidden Markov Model (HMM) was used to create it [17].
Gregoire Surrel et al. proposed a wearable device that was an energy-efficient system
through time-domain analysis using single-channel ECG signals. This device can transfer
its results to an internet website for constant monitoring and tracking of the progression of
the ailment due to its Bluetooth connection [18].
In addition, Singh et al. [19] employed a continuous wavelet transform (CWT) to
construct a two-dimensional scalogram image from each minute of the ECG segment. It
analyzed CNN and AlexNet models.
Tao Wang et al. devised a method that employs a time window artificial neural network
to model the temporal dependency between ECG signal segments without requiring any
previous preconceptions about training data distribution [12].
Some authors have also worked on OSA detection using multiple other techniques.
Table 2 shows some of these results. We can see that a maximum accuracy of 97.5% was
achieved by [7] with the help of a random forest classifier. Whereas the lowest of 80.5%
was exhibited by [20] who used Single channel ECG and hybrid ML Models. The above
studies have used the Physionet Apnea dataset with various combinations of released and
withheld data apart from 35 + 35 to produce results.

Table 1. Cumulative study of previous work conducted on the Physionet Apnea-ECG database, using
Lead-II ECG signals (35 withheld + 35 released).

Year Authors Technique Used Application Features Accuracy


Temporal dependence
Gutta, S., Cheng, Q.,
Hidden Markov on signals using
2017 [21] Nguyen, H.D. and OSA Detection 82.33%
Model Hidden Markov
Benjamin, B.A.
Model (HMM)
Hidden Markov
DNN and HMM
Wang, L., Lin, Y. and Model (HMM) and
2018 [9] using single lead OSA Detection 85%
Wang, J. Deep Neural
ECG signal
Network (DNN)
Surrel, G., Aminifar,
RR Interval and RS
2018 [18] A., Rincon, F. and SVM OSA Detection 88.2%
amplitude time series
Murali, S.
Singh, S.A. and AlexNet model, Time-Frequency
2019 [19] OSA Detection 86.22%
Majumder, S. CNN Scalogram
Time window
RR Interval and RPeak
2019 [22] Lu, C. and Shen, G. artificial neural SA Detection 87.3%
Amplitude
network (TW-MLP)
Unsupervised Frequential stacked
Feng, K., Qin, H., Wu,
2020 [17] feature learning, SA Detection sparse autoencoder 85.1%
S., Pan, W. and Liu, G.
single lead ECG (FSSAE)
Shen, Q., Qin, H., Wei, Multiscale deep Multiscale Dilation
2021 [8] OSA Detection 89.4%
K. and Liu, G. neural network Attention Convolution
Sensors 2023, 23, 4692 4 of 20

Table 2. A comparative study of multiple other methods on the Physionet Apnea-ECG database,
using Lead-II ECG signals other than (35 withheld + 35 released).

Year Authors Technique Used Application Features Accuracy


Detection of
Fisher Feature
Bozkurt, F., Uçar, M.K., Single channel ECG abnormal
Selection Algorithm,
2020 [23] Bozkurt, M.R. and and hybrid ML respiratory events 85.12%
Principal Component
Bilgin, C. Model with obstructive
Analysis
sleep apnea
Mc-Clure, K., Erdreich,
Multiscale Deep
B., Bates, J.H.T.,
Neural Network and 1-D Convolutional
2020 [5] McGinnis, R.S., OSA Detection 86%
1DCNN with Neural Network
Masquelin, A. and
wireless sensors
Wshah, S.
Machine Learning Least Absolute
(KNN, SLR, ANN, Shrinkage and
2019 [20] Stretch, R. et al. SVM, Gradient OSA Detection Selection Operator 80.5%
boosted decision (LASSO) and Ridge
tree, etc.) Regression
Singh, S.A. and Sleep Apnea
2019 [19] AlexNet Deep Neural Network 86.22%
Majumder, S. Detection
Liang, X., Qiao, X. and
2019 [24] CNN and LSTM OSA Detection RR Interval 99.8%
Li, Y.
Gutta, S., Cheng, Q., Vector valued
2017 [21] Nguyen, H.D. and Gaussian processes OSA Detection RR Interval 82.33%
Benjamin, B.A. (GPs)
Song, C., Liu, K., Zhang, Hidden Markov
2015 [7] OSA Detection RR Interval 97.1%
X., Chen, L. and Xian, X. Model

3. Methods and Material


3.1. Dataset Description
The proposed paper uses the apnea-ECG database from PhysioNet, which contains
70 single-lead ECG recordings, to train and evaluate the suggested techniques. The data
were gathered at a polling rate of 100 Hz, with a resolution of 16 bits and nominally
200 A/D units per millivolt, and a resolution of 16 bits with the least significant byte
first in each pair. This results in a database having 34,428 min of data and 34,230 min of
annotations. The length of each recording varies from just less than 7 h to approximately
10 h. The occurrence of OSA in each individual record is predetermined using other
metrics and signals (e.g., apnea annotations manually derived based on synchronously
recorded respiration and related signals, computer-generated QRS annotations, chest, and
polysomnographic (PSG) data), but no determination on the occurrence of hypopnea
or apnea was made. There are two types of datasets among the 70 records obtained:
the withheld dataset (nomenclature a01–a20, b01–b05, c01–c10) and the released dataset
(nomenclature x01–x35). As a result, the withheld set was used to train the model, while
the released set was utilized to validate it [13].

3.2. Pre-Processing of Data


Single-lead ECG measurements were performed on patients for 7 to 10 h. Following
that, the ECG data were separated into 60-s segments for analysis. Signals containing
distorted waveforms and samples with a total duration of less than one minute were
eliminated from the data set utilized to construct the model. Furthermore, noisy segments
in the remaining dataset were discarded using an algorithmic weight calculation method.
The auto-correlation functions of the parts were measured with a 60 s time delay using the
aforementioned procedure for elimination. Only about 3 to 5 percent (1404) of the segments
were eliminated during the overall cleaning procedure. The dataset includes 70 patients
and is split into two parts: the first portion contains the data of 35 patients for the purpose
of training, and the second part includes the data of 35 patients for the purpose of testing
the remaining dataset were discarded using an algorithmic weight calculation method.
The auto-correlation functions of the parts were measured with a 60 s time delay using
the aforementioned procedure for elimination. Only about 3 to 5 percent (1404) of the seg-
Sensors 2023, 23, 4692 ments were eliminated during the overall cleaning procedure. The dataset includes 705pa- of 20
tients and is split into two parts: the first portion contains the data of 35 patients for the
purpose of training, and the second part includes the data of 35 patients for the purpose
andtesting
of and validation
validation to generatetothe
generate
study’sthe study’s
results. results.
Figure Figure 1the
1 illustrates illustrates the stepsinin-
steps involved the
volved in the preprocessing
preprocessing of the data. of the data.

Figure1.
Figure 1. Preprocessing
Preprocessing of
of Physionet
Physionet Apnea-ECG
Apnea-ECG database.
database.

3.3.
3.3. Training
TrainingParameters
Parameters
3.3.1.
3.3.1. Optimizer
Optimizer
An
An optimizer
optimizer is is aa function/algorithm
function/algorithm to tomodify
modifythe theattributes
attributesofofaaneuralneuralnetwork
network
(e.g.,
(e.g.,weights
weights of epochs, learning
of epochs, learningrate)rate)andand serves
serves to minimize
to minimize the the
loss loss function,
function, thus
thus im-
improving accuracy. The weight is initialized using a variety of techniques
proving accuracy. The weight is initialized using a variety of techniques and modified and modified
with
witheach
eachepoch
epoch according
according to to the
the update
update equation [25,26].
equation [25,26].
Adaptive
Adaptive first- and second-order moment estimation isisused
first- and second-order moment estimation usedininthe thestochastic
stochasticgradi-
gradi-
ent descent method known as Adam optimization. Instead of
ent descent method known as Adam optimization. Instead of employing the standard sto-employing the standard
stochastic gradient
chastic gradient descent
descent technique,
technique, it may it may regularly
regularly modify modify
weights weights in the network
in the network based
based
on training statistics [26]. The gradient descent approach can be sped up byup
on training statistics [26]. The gradient descent approach can be sped by using
using the
the gradients’
gradients’ exponentially
exponentially weighted
weighted average.
average. TheThe gradient
gradient descent
descent approachcan
approach canbebesped
sped
up
up bybyusing
using the
thegradients’
gradients’ exponentially
exponentially weighted average. Stochastic
weighted average. Stochasticgradient
gradientdescent
descent
operates a single learning rate for all weight changes (alpha) [27,28].
operates a single learning rate for all weight changes (alpha) [27,28]. Throughout the train- Throughout the
training, the pace, as mentioned earlier, remains constant. As learning
ing, the pace, as mentioned earlier, remains constant. As learning progresses, the learning progresses, the
learning rate network
rate of each of each network weight (parameter)
weight (parameter) is adjustedis adjusted individually.
individually. Adam extendsAdam the extends
ca-
the capabilities of stochastic gradient descent by combining the benefits
pabilities of stochastic gradient descent by combining the benefits of two previous opti- of two previous
optimization methodologies:
mization methodologies: the Mean
the Root Root Mean
SquareSquare Propagation,
Propagation, which maintains
which maintains per-param- per-
parameter learning
eter learning ratesare
rates that that are tailored
tailored basedbased on of
on a set a set of boundaries
boundaries (e.g., quickly
(e.g., how how quickly
it is
itchanging),
is changing), and the Adaptive Gradient Algorithm/Momentum,
and the Adaptive Gradient Algorithm/Momentum, which improves perfor- which improves per-
formance on problems with sparse gradients (such as natural language
mance on problems with sparse gradients (such as natural language processing and com- processing and
computer
puter visionvision problems)
problems) [29]. non-stationary
[29]. Both Both non-stationary and stationary
and stationary problems problems can be
can be tackled
tackled using this
using this method. method.

3.3.2. Activation Function


An activation function in a neural network explains how the weighted sum of a
neuron’s input is converted into output by calculating the sum and adding its own bias
to it in a layer of the neural network [30,31]. This is important for converting what would
essentially be a linear regression model into a capable neural network with nonlinear
inputs that can learn and perform more complex tasks, and it works in conjunction with the
model’s use of the backpropagation weight initialization technique because the gradients
are supplied along with the error to update the weights and biases, and is thus important
Sensors 2023, 23, 4692 6 of 20

for converting what would essentially be a linear regression model into a capable neural
network with nonlinear inputs that can learn and perform more complex tasks [30,32].
As a result, it is critical to the network’s success, and multiple activation functions are
frequently employed for different portions of the model. The rectified linear activation
function (ReLU), a piecewise linear function used in this work, returns zeros for negative
input and returns the input directly for positive input. It can be expressed in Equation (1)

f (x) = max(0, x) (1)

where, f (x) equals x when x is less than zero, and f (x) equals x when x is higher than or
equal to zero. As a result, it is the default activation for building multilayer perceptron
and convolutional neural networks since it resolves the vanishing gradient problem, en-
abling models to train more quickly and perform better. Stochastic gradient descent with
backpropagation of errors is needed to train deep neural networks. Although it appears to
behave and look like a linear function, it actually possesses nonlinear properties that can
be utilized to uncover intricate data correlations [29]. Additionally, it reduces saturation
while increasing sensitivity to overall activity. When employing backpropagation to train a
neural network, the function is linear for values greater than zero, which has many of the
same advantages as a linear activation function. Negative values are always output as zero,
indicating that the function is nonlinear.

3.3.3. Weight Initializer


Prior to training neural network models on a dataset, weight initialization is used
to specify the initial values for the parameters which are repeatedly updated with the
training of the model [33]. This process of weight initialization has a significant effect on
the classification of a model. Its goal is to avoid layer activation outputs from becoming
worthless due to explosion or disappearance during deep neural network forward passes.
To update weights, the proposed model employs the backpropagation mechanism, whereby
current weights are dependent on prior weights. This prohibits setting the weights to zero,
which would cause gradients to burst. The alternative outcomes are represented by a
normal distribution of weights with a mean of 0 and a standard deviation of 1.

3.3.4. Loss Function


In the context of model refinement, a loss function is a function that maps variable
events onto an arbitrary real number representing a specific cost associated with the event,
i.e., the loss function is responsible for computing the graphed distance between the
expected output and the actual output of an algorithm to evaluate the algorithm’s perfor-
mance. Stochastic gradient descent is used to train neural networks, which necessitates
the usage of a loss function. It is in charge of selecting the most appropriate weights
and condensing a model’s attributes into quantifiable performance metrics that, when in-
creased, indicate an increase in the model’s accuracy along with improved robustness [34].
Categorical cross-entropy is a loss function used to assess the difference between two prob-
ability distributions in multiclass classification issues. It is just a softmax activation with
some cross-entropy loss tossed in for good measure. Cross-entropy loss is mathematically
described in Equation (2),
c
CE = ∑ j t j log s j

(2)

where tj and sj are the ground truth and the CNN score for each class j in c. Table 3
displays the hyperparameters that were taken into consideration while tuning the hybrid
deep-learning models.
learning models.

Table 3. Hyperparameter Tuning.

S. No Hyperparameter Value
Sensors 2023, 23, 4692 1 Batch Size 32 7 of 20

2 Learning Rate 0.001


3 Epochs 100
Table
4 3. Hyperparameter
Loss FunctionTuning. Categorical Cross Entropy
5S. No Activation Function
Hyperparameter ReLu Value
6 Optimizer Adam
1 Batch Size 32
2 Learning Rate 0.001
3.4.
3 Training Procedure
Epochs 100
4 Loss Function
3.4.1. Architecture of MobileNet V1 Categorical Cross Entropy
5 Activation Function ReLu
6 MobileNet is a reduced depth-wise structure incorporating
Optimizer Adam a convolutional layer that
may be used to differentiate details based on two controllable features that efficiently tran-
sition between the parameter’s Accuracy and latency. [35]. The core structure consists of
3.4. Training Procedure
multiple abstraction layers, each of which is a component of distinct convolutions that
3.4.1. Architecture of MobileNet V1
seem to be the quantized configuration that analyses the complexity of a typical problem
MobileNet
in great detail. is a reduced
[36]. depth-wiseisstructure
This architecture ideal forincorporating
use on mobile a convolutional layer that
devices and computers
may be used to differentiate details based on two controllable
with limited computational power because it is highly efficient with a small number of features that efficiently
transition
features (forbetween the parameter’s
example, Accuracy and
palmprint recognition) andlatency
requires[35]. The
a lot core
less structure consists
computational work
of multiple abstraction layers, each of which is a component of distinct
than conventional CNN models. Detailed images depicting the aforementioned architec- convolutions that
seem to be the quantized configuration that analyses the complexity of a typical
ture can be referred to in the article available online [37]. It also reduces network size [38] problem in
great detail [36]. This architecture
and along with depiction [39]. is ideal for use on mobile devices and computers with
limited computational power because it is highly efficient with a small number of features
(for example,
3.4.2. Recurrent palmprint recognition) and requires a lot less computational work than
Neural Network
conventional CNN models. Detailed images depicting the aforementioned architecture can
A recurrent
be referred to in theneural network
article (RNN)
available onlineis[37].
a machine
It also learning algorithm
reduces network that
size can
[38] andevaluate
along
the linear input
with depiction [39]. of samples once at a time. RNN adapts to volatile data in sequential infor-
mation of different sizes [40,41]. Figure 2 depicts the building of a traditional RNN. The
present
3.4.2. data Xt Neural
Recurrent are delivered
Networkto the node z(t + 1), along with the hidden layer’s hidden
stateAdata
recurrent neural network ht
from the earlier stage (RNN)shown is ainmachine
Figure 2.learning
As a consequence,
algorithm that RNN canis evaluate
a looped
neural
the linearnetwork
input that changesonce
of samples over attime to allow
a time. RNN information
adapts totovolatile
persist.data in sequential
information of different sizes [40,41]. Figure 2 depicts the buildinga of
Given an input sequence X = [x1, x2, …, xT], an RNN specifies series of hiddenRNN.
a traditional states
ht given by Equation (3)
The present data Xt are delivered to the node z(t + 1), along with the hidden layer’s hidden
state data from the earlier stage ht=shown
ht+1 ψ(Zt+1)in= ψ(W
Figure 2. As a consequence, RNN is a looped
hht + Wx Xt + b) (3)
neural network that changes over time to allow information to persist.

Figure 2. The construction of a conventional RNN.

Given an input sequence X = [x1, x2, . . . , xT], an RNN specifies a series of hidden
states ht given by Equation (3)

ht+1 = ψ(Zt+1 ) = ψ(Wh ht + Wx Xt + b) (3)

An RNN may be conceived of as numerous replicas of the same network, each for-
warding a message to a successor, as represented in Figure 3.
Figure 2. The construction of a conventional RNN.

An RNN may be conceived of as numerous replicas of the same network, each for-
Sensors 2023, 23, 4692 8 of 20
warding a message to a successor, as represented in Figure 3.

Figure 3. A complex RNN structure.


Figure 3. A complex RNN structure.
RNNs were created to model sequential data. However, because of the common
RNNsofwere
difficulty created to model sequential
vanishing/exploding gradients,data. However,
training RNNsbecause of the common
using stochastic dif-
gradient
ficulty of(SGD)
descent vanishing/exploding
is fairly difficultgradients,
[42]. Thetraining
burstingRNNs using
gradient stochastic
issue is verygradient
simple to descent
solve
(SGD)the
when is fairly difficult
gradients’ norm [42]. The bursting
is restricted. On thegradient issue is
other hand, very simpleadvancements
architectural to solve whensuch the
as LSTM, GRU, and iRNN/uRNN [21,42] may aid with the issue of vanishing gradients.as
gradients’ norm is restricted. On the other hand, architectural advancements such
LSTM, GRU, and iRNN/uRNN [21,42] may aid with the issue of vanishing gradients.
3.4.3. Long Short-Term Memory
3.4.3.Several
Long Short-Term
theoreticalMemory and practical articles on this kind of Recurrent Neural Network
(RNN) Several theoretical and since
have been published the first
practical LSTM
articles onresearch
this kindinof1997 [43]. Several
Recurrent Neural researchers
Network
remarked upon these exceptional outcomes involving sequential
(RNN) have been published since the first LSTM research in 1997 [43]. Several researchers datasets such as text,
language modeling, video, and speech-to-text transcription
remarked upon these exceptional outcomes involving sequential datasets such as text, [21,44]. Influenced mainly
by its excellent
language modeling,standards
video,discussed in the literature,
and speech-to-text numerous
transcription readers
[21,44]. in academic
Influenced mainlyandby
commercial contexts seek to understand more about the Long
its excellent standards discussed in the literature, numerous readers in academic and com-Short-Term Memory net-
work
mercial (therefore,
contexts“the seekLSTM network”)more
to understand to assess
aboutitsthe
relevance to their research
Long Short-Term Memory or practical
network
application. Several RNN and LSTM network frameworks are
(therefore, “the LSTM network”) to assess its relevance to their research or practical efficiently and productionap-
readily
plication. implemented
Several RNN in all
and leading
LSTMaccessible
network machine-learning
frameworks are efficientlyplatforms.and The production
ease of use
and lowimplemented
readily cost of development and testing
in all leading help machine-learning
accessible many experimenters, particularly
platforms. thoseofnew
The ease use
to RNN/LSTM systems.
and low cost of development and testing help many experimenters, particularly those new
On the other hand, others are keener in delving further into every component of
to RNN/LSTM systems.
the system’s working. Taking the longer route allows you to develop intuition that will
On the other hand, others are keener in delving further into every component of the
assist you with data preparation, troubleshooting, and adapting an open-source module
system’s working. Taking the longer route allows you to develop intuition that will assist
to fulfill the requirements of your academic project or commercial solution. In most
you with data preparation, troubleshooting, and adapting an open-source module to ful-
cases, one such task expands to include reading a slew of documents, blog entries, and
fill the requirements of your academic project or commercial solution. In most cases, one
implementation guides to gain an “A to Z” knowledge of the system’s core principles and
such task expands to include reading a slew of documents, blog entries, and implementa-
operations, only to discover that the vast majority of the resources leave one or more critical
tion guides to gain an “A to Z” knowledge of the system’s core principles and operations,
questions about the fundamentals unanswered. Unrolling is frequently offered as lacking
only to discover that the vast majority of the resources leave one or more critical questions
reasoning. The Recurrent Neural Network (RNN), a generic category of neural network that
about the fundamentals unanswered. Unrolling is frequently offered as lacking reasoning.
precedes and includes the LSTM network as an example, is commonly presented without
The Recurrent Neural Network (RNN), a generic category of neural network that precedes
reason. Furthermore, the training equations are frequently removed, making the observer
and includes
bewildered andtheneeding
LSTM network
additional as materials
an example, is commonly
to reconcile presentednotations
the numerous without used.
reason.
Furthermore, the training equations are frequently removed, making
The LSTM network tackles the problem of unstable gradients by enabling the network the observer bewil-
dered and needing additional materials to reconcile the numerous
to learn long-term dependencies. Zaremba’s [45] LSTM will be demonstrated in-depth. notations used.
Assume The“U” LSTM network
is the tackles the problem
linear convolution of unstable
transformation of thegradients by enabling
current input, the net-
and denotes the
work to learn long-term dependencies. Zaremba’s [45] LSTM
linear convolution transformation of the prior output. The input, cell, and hidden states will be demonstrated in-
depth. Assume “U” is the linear convolution transformation of the
are represented by the steps “n”, “Xn”, “cn”, and “hn”. The hidden state hn and cell state current input, and de-
notes
cn are the linear convolution
computed for the current transformation
input xn as shown of the inprior output.(4)–(6),
Equations The input, cell, and hid-
den states are represented by the steps “n”, “Xn”, “cn”, and “hn”. The hidden state hn
and cell state cn are computedpfor thehcurrent
= tan ((Uxn +input
Whn−xn 1) +asb)shown in Equations (4)–(6), (4)
p = tan h ((Uxn + Whn−1) + b) (4)
CnC=n f= #C
f #C 1 ++i#j
n−n−1 i#j (5)
(5)

hn = o# tan cn (6)
Sensors 2023, 23, x FOR PEER REVIEW 9 of 21

Sensors 2023, 23, 4692 9 of 20


hn = o# tan cn (6)
where b represents all bias terms and # signifies elementwise multiplication. The forget
where b represents
gate, input all out
gate, and biasthe
terms
gateand are#represented
signifies elementwise
by the letters multiplication.
f, i, and o asThe forget
shown in
gate, input gate,
Equation (7), and out the gate are represented by the letters f, i, and o as shown in
Equation (7),
f, i,f,oi,=oσ((U
= σ((U + +Wh
xn xn Wh 1 ))++ bb
n−n−1 (7)
A number
A number between
between 11 and
and 00 is
is returned
returned byby the
the sigmoid
sigmoid activation function.
function. Finally,
Finally,
“hn”represents
“hn” representsthe
theLSTM
LSTMlayer’s
layer’s output
output at
at iteration
iteration n.
n. The
The fundamental construction
construction of
theLSTM
the LSTMmemory
memoryblock
blockisisdemonstrated
demonstrated in in Figure
Figure 4.
4.

Figure4.4.Basic
Figure BasicLSTM
LSTM Structure.
Structure.

XXLiang
Lianget
etal.
al. [21]
[21] used
used the
the unfolding of Bidirectional LSTM networks (BLSTM). Two Two
distinctLSTM
distinct LSTMnetworks
networks govern
govern forward. The proposed and backward backward motion
motion in inBLSTM.
BLSTM.
Thisstructure
This structureanalyzes
analyzescombining
combining prior
prior andand upcoming
upcoming sequential
sequential information
information in real-
in real-time.
time. Figure
Figure 4 depicts
4 depicts the LSTM the fundamental
LSTM fundamental in its
in its most most
basic basic
form. Wform.
Yang W Yang
et al. [46]etused
al. [46]
the
used the polysomnography
MIT-BIH MIT-BIH polysomnographydatabase [13]database
to test [13]
theirtotechnique.
test their technique.
They took 14 They took 14
recordings,
recordings,
including including
one that hadone that had
a signal a signal
detected usingdetected
a nasalusing a nasalHowever,
thermistor. thermistor. However,
there are no
there are nofor
annotations annotations
start and for endstart and end
positions positions
in this in this
database. database.
It only offersItepoch
only offers epoch
annotations
annotations
for 30 s. Thefor 30 s. The
events were events
taggedwere
as tagged as 30 annotations,
30 s epoch s epoch annotations,
and theand the database’s
database’s single
single respiration
respiration signal
signal was wastosent
sent the to the LSTM
LSTM network.
network. The method’s
The method’s recallrecall
on theon MIT-BIH
the MIT-
BIH polysomnography
polysomnography database
database is 90.0%,
is 90.0%, 87.1%,
87.1%, andand 83.2%,
83.2%, respectively,
respectively, for for normal,
normal, ap-
apnea,
nea,hypopnea
and and hypopnea episodes.
episodes.

3.4.4.
3.4.4.Gated
Gated Recurring
Recurring UnitUnit
After
AfterMobileNet
MobileNetV1, V1,the
the Gated
Gated Recurrent
Recurrent Unit
Unit (GRU),
(GRU), anan update of the
update of the standard,
standard, was
was
used in place of the LSTM. There is no defined cell state in GRUs. There is
used in place of the LSTM. There is no defined cell state in GRUs. There is just a hiddenjust a hidden
state.
state.GRUs
GRUscan canbebetrained
trainedmore
morequickly because
quickly of of
because their more
their morestraightforward
straightforward architecture.
architec-
GRUs
ture. GRUs are able to store and filter the data with the help of their updatereset
are able to store and filter the data with the help of their update and and gates.
reset
Keeping the crucial
gates. Keeping the information and transmitting
crucial information it to the network’s
and transmitting subsequentsubsequent
it to the network’s time steps
rather than discarding the fresh input each time solves the vanishing gradient problem [47].
Sensors 2023, 23, 4692 10 of 20

3.4.5. Proposed Algorithm


MobileNet V1
Depthwise separable convolutions are used in order to implement MobileNet V1.
Following are the two parts of a depthwise separable convolution namely, Depthwise
convolution which is a distinct map for each input channel of a single convolution. As
a result, the total output channels are equal to the total input channels and pointwise
convolution with a 1 × 1 kernel size that merely combines the depthwise convolution’s
features as shown in Algorithm 1.

Algorithm 1: Training Procedure of the MobileNet V1


begin
(1) Select “n” data as training samples in order.
(2) Apply the Depthwise Separable Convolution operation with the form of DK × DK .
(3) After Step 2, apply Pointwise Convolution to reduce the dimension.
(4) Apply Batch Normalization and ReLU after each convolution.
(5) Introduce Width Multiplier α. To control the total number of channels or channel depth, M
converts to α M. The value of α is ranging from 0 to 1, with standard settings of 1, 0.75, 0.5, and 0.25.
(6) for α = 1, it is the baseline MobileNet V1. The number of parameters and the computing cost
can both be lowered quadratically by roughly α2, with Accuracy dropping off smoothly from
α = 1 to 0.5, until α = 0.25 which is too small. To control the network’s input values Resolution
Multiplier “ρ” was applied, which ranges from 0 to 1.
(7) for ρ = 1, it is the baseline MobileNet V1.
end

MobileNet V1 + LSTM
The proposed model intends to combine MobileNet V1 with LSTM and produce
outputs as a combination of the two, given ECG signal is the input fed to the model. The
working of MobileNet V1 with LSTM can be seen in Algorithm 2.

Algorithm 2: Training Procedure of the MobileNet V1 + LSTM


begin
(1) Select 35 data as training samples in order.
(2) Df2 × M × Dk2
(3) M × N × Df2
(4) Apply Batch Normalization and ReLU after each convolution
(5) Introduce Width Multiplier
(6) for α = 1, it is the baseline MobileNet V1
(7) for ρ = 1, it is the baseline MobileNet V1. LSTM
(8) Input ht1 , Ct1 , and xt .
(9) Input to first sigmodal layer ht1 , xt .
(10) Multiply output of forget gate [0, 1] × Ct1 .
(11) Input to second sigmodal layer ht1 , xt .
(12) The tanh layer creates a vector Ct .
(13) Pointwise multiplication it × Ct .
(14) Forget gate output multiplied with previous cell state ft × Ct1 .
(15) The output is determined by the sigmoid layer, while the tanh layer modifies it in the range of [1].
(16) To obtain the cell’s output ht, the resultant of both layers is multiplied with point-wise multiplication.
end

MobileNet V1 + GRU
The proposed model intends to combine MobileNet V1 with GRU and produce outputs
as a combination of the two, the given ECG signal is the input fed to the model. The model
architecture and Algorithm are quite similar to the above-mentioned approach the only
difference is the use of GRU instead of LSTM. Figure 5 shows the schematic architecture of
Sensors 2023, 23, 4692 11 of 20

the proposed MobileNet V1 + GRU model. The working of MobileNet V1 with LSTM can
be seen in Algorithm 3.

Algorithm 3: Training Procedure of the MobileNet V1 + GRU


begin
(1) Select 35 data as training samples in order.
(2) Df2 × M × Dk2
(3) M × N × Df2
(4) Apply Batch Normalization and ReLU after each convolution
(5) Introduce Width Multiplier.
(6) for α = 1, it is the baseline MobileNet V1.
Sensors 2023, 23, x FOR PEER REVIEW (7) for ρ = 1, it is the baseline MobileNet V1. GRU 12 of 21
(8) Input ht1 , Ct1 , and xt .
(9) Input to first sigmodal layer ht1 , xt .
(10) Calculate update gate zt.
MobileNet V1 reset
(11) Calculate + GRU gate rt , for model to decide on past information to forget, using which new
memory content will store relevant information from past.
The proposed model intends to combine MobileNet V1 with GRU and produce out-
(12) Network will then calculate ht vector holding information for current unit and pass it down
puts as a combination
to network. An update of the
gate is two,
needed thetogiven ECGwhat
determine signal is the from
to collect inputcurrent
fed tomemory
the model. The
content
model architecture and Algorithm
h’t and what from previous step ht− 1 . are quite similar to the above-mentioned approach the
only difference is the use of GRU
(13) Pointwise multiplication it × Ct . instead of LSTM. Figure 5 shows the schematic architec-
ture
(14)of
To the proposed
obtain MobileNet
the cell’s output V1 + GRU
ht, the resultant model.
of both layers isThe working
multiplied withof MobileNet
point-wise V1 with
multiplication.
end can be seen in Algorithm 3.
LSTM

Figure 5.
Figure Schematic outline
5. Schematic outline of
of the
the proposed
proposed architecture
architecture of MobileNet V1 ++ GRU.
GRU.

4. Experimental Results
Algorithm 3: Training Procedure of the MobileNet V1 + GRU.
4.1. Experimental Setup
begin
The code implementation was carried out using the TensorFlow framework on desktop
(1) Select 35 data as training samples in order.
PCs with
2
an Intel(R)
2
Core(TM) i7-6500U CPU running at 2.50 GHz, an Nvidia 940 M GPU
(2)
withDfa computing
× M × Dk capacity of 5.0, and 16 GB of RAM. The training was conducted over a
2
limited
(3) M×N number
× Df of epochs. The entire training was conducted on a workstation with an
Nvidia
(4) Apply RTX 2080Normalization
Batch GPU with a computation
and ReLUcapability
after eachofconvolution
8.60 and 11 GB of GPU RAM. To
obtain
(5) the highest
Introduce Width training and testing accuracy, each Model was trained for 100 epochs.
Multiplier.
(6) for α = 1, it is the baseline MobileNet V1.
4.2. Evaluation Index
(7) for ρ = 1, it is the baseline MobileNet V1. GRU
To formulate the various values needed for functioning of the model, researchers em-
(8) Input ht1, Ct1, and xt.
ployed the evaluation methods such as accuracy, specificity, sensitivity, precision and recall
(9) Input to first sigmodal layer ht1, xt.
as described below in Equations (8)–(12) [48]: The evaluation index analysis on MobileNet
(10)
V1 isCalculate
shown inupdate
Table 4.gate zt.
Removing four layers from the original MobileNet V1 model also
(11) Calculate reset gate r t, for model to decide on past information to forget, using
helps to reduce the model’s computation time. Furthermore, the evaluation index analysis
which new memory
on MobileNet content
V1 +LSTM, andwill store relevant
MobileNet V1 + GRUinformation from
is shown in past.
Tables 5 and 6.
(12) Network will then calculate ht vector holding information for current unit and pass
it down to network. An update gate is=needed TP
Accuracy
+ TN
to determine what to collect from current
(8)
memory content h’t and what from previous step h+t−1FP
TP + TN . + FN
(13) Pointwise multiplication it × Ct.
(14) To obtain the cell’s output ht, the resultant of both layers is multiplied with
point-wise multiplication.
end
Sensors 2023, 23, 4692 12 of 20

TN
Speci f icity = (9)
TN + FP
TP
Sensitivity = (10)
TN + FP
TP
Precision = (11)
TP + FP

2 × Precision × Recall
F1 = (12)
Precision + Recall
where,
TP = True Positive
TN = True Negative
FP = False Positive
FN = False Negative

Table 4. Evaluation analysis for each segment OSA Detection with MobileNet V1.

Spec. Sens./Rec. Prec. F1 Acc.


89 90 89 91 89.5

Table 5. Evaluation analysis for each segment OSA Detection with MobileNet V1 + LSTM.

Spec. Sens./Rec. Prec. F1 Acc.


90.3 89.82 94.5 92.1 90

Table 6. Evaluation analysis for each segment OSA Detection with MobileNet V1 + GRU.

Spec. Sens./Rec. Prec. F1 Acc.


90.72 90.01 94.71 92.3 90.29

4.3. Results
The data of 70 patients used in this study have been evaluated using MobileNet V1 + LSTM
and MobileNet V1 + GRU. Table 7 shows a comparative analysis for each segment of OSA
Detection with various models, where Model 1 represents the implementation of MobileNet
V1, Model 2 represents the implementation of MobileNet V1 and LSTM, and Model 3
represents the implementation of MobileNet V1 and GRU.

Table 7. Comparative analysis for each segment OSA Detection with various models.

Model 1 Model 2 Model 3


Diagnosis
Prec. Rec. F1-Meas. Prec. Rec. F1-Meas. Prec. Rec. F1-Meas.
Apnea 89 94 91 85 95 92 90.06 94.71 92.33
Normal 90 81 85 90 81 86 90.72 83.18 86.79

A model is a true positive when it accurately predicts the positive class. A true negative,
on the other hand, is a result for which the model correctly predicts the negative class.
When the model wrongly predicts the positive class, a false positive occurs. A false negative
is a result where the model incorrectly predicted the negative class. Table 8 shows the result
of the enhanced MobileNet V1 + LSTM model (Model 2) and MobileNet V1 + GRU model
(Model 3) for per segment classification.
Table 7. Comparative analysis for each segment OSA Detection with various models.

Model 1 Model 2 Model 3


Diagnosis
Prec. Rec. F1-Meas. Prec. Rec. F1-Meas. Prec. Rec. F1-Meas.
Sensors 2023, 23, 4692 Apnea 89 94 91 85 95 92 90.06 94.71 92.33
13 of 20
Normal 90 81 85 90 81 86 90.72 83.18 86.79

AThe
Table 8. model is a true
outcomes positive
of the when
enhanced it accurately
Model predicts
2 and Model thesegment
3 for per positive class. A true nega-
classification.
tive, on the other hand, is a result for which the model correctly predicts the negative class.
WhenApproach
the model wrongly predicts Sensitivity Specificity
the positive class, a false positive occurs. Accuracy
A false nega-
tive is Model
a result2 where the model incorrectly predicted90.34
89.82 the negative class. Table 8 shows the
90.00
result Model 3
of the enhanced MobileNet 90.01V1 + LSTM model90.72 (Model 2) and MobileNet 90.29V1 + GRU
model (Model 3) for per segment classification.
Accuracy graphs are a representation of the model’s performance, based on how much
Table
data and8. The outcomes
experience ofmade
it is the enhanced
to workModel
with. 2The
andtraining
Model 3and for per segment
testing classification.
accuracy are both
plotted on the graph to determine the occurrence and severity of overfitting the data; that is,
Approach Sensitivity Specificity Accuracy
the learning happens to such an extent that it is clearly negatively impacting the accuracy
Model 2 89.82 90.34 90.00
of the results. The gap between the training and testing lines of best fit can be used to check
Model 3 90.01 90.72 90.29
the severity of overfitting. The greater the gap, the fewer epochs should be used to train
the model. Figure 6 shows the AUC curve of MobileNet V1 + GRU. Figure 7 shows the
modelAccuracy
accuracy graphs are a representation
corresponding to epochs usingof the model’s V1
MobileNet performance, based8on
whereas Figures how
and 9
much data and experience it is made to work with. The training
show the same for MobileNet V1 + LSTM and MobileNet V1 + GRU, respectively. Accuracy and testing accuracy are
both plotted
graphs on the graph
of the models proposedto determine the show
in the paper occurrence and severity
significant of overfitting
improvements the data;
to the existing
that is,works
related the learning happens
in accordance withto better
such an extent and
accuracy that low
it is separation
clearly negatively
betweenimpacting
training andthe
accuracy of the
testing accuracies. results. The gap between the training and testing lines of best fit can be
used to check the severity of overfitting. The greater the gap, the fewer
Loss curves are a visual representation of the direction in which the learning of a epochs should be
CNN model takes place, corresponding to the experience and amount of training data it7
used to train the model. Figure 6 shows the AUC curve of MobileNet V1 + GRU. Figure
isshows
given.the model
It has accuracy corresponding
an exponentially improvingtolearning
epochs using MobileNet
rate. Figure V1 whereas
10 shows Figures
a variation of
8 and
loss 9 show the same
corresponding for MobileNet
to epochs V1 + LSTM
using MobileNet V1.and MobileNet
Figures 11 andV112+ GRU,
show respectively.
a variation
ofAccuracy graphs of the
loss corresponding models using
to epochs proposed in the paper
MobileNet show significant
V1 + LSTM and MobileNetimprovements
V1 + GRU,to
the existing related
respectively. The loss works
curves in accordance with better
of the proposed methodsaccuracy
can beand low
seen toseparation between
become stagnant,
trainingand
parallel andslightly
testingaway
accuracies.
from each other as the model training comes to an end.

Figure6.6.AUC
Figure AUCCurve
CurveofofMobileNet
MobileNet++GRU.
GRU.
Sensors 2023,23,
Sensors2023, 23,4692
x FOR PEER REVIEW 14
15 of
of2021

Figure 7. Model accuracy corresponding to epochs (using MobileNet V1).


Figure 7. Model accuracy corresponding to epochs (using MobileNet V1).

Figure8.8.Model
Figure Modelaccuracy
accuracycorresponding
correspondingtotoepochs
epochs(using
(usingMobileNet
MobileNetV1
V1++LSTM).
LSTM).
Sensors 2023,
Sensors 23,23,
2023, 4692
x FOR PEER REVIEW 15
16ofof2021

Figure 9. Model accuracy corresponding to epochs (using MobileNet V1 + GRU).


Loss curves are a visual representation of the direction in which the learning of a
CNN model takes place, corresponding to the experience and amount of training data it
is given. It has an exponentially improving learning rate. Figure 10 shows a variation of
loss corresponding to epochs using MobileNet V1. Figures 11 and 12 show a variation of
loss corresponding to epochs using MobileNet V1 + LSTM and MobileNet V1 + GRU, re-
spectively. The loss curves of the proposed methods can be seen to become stagnant, par-
allel and slightly
Model awaycorresponding
accuracy from each other as the(using
to epochs model training V1
comes to an end.
Figure
Figure 9. 9. Model accuracy corresponding to epochs MobileNet + GRU).
(using MobileNet V1 + GRU).
Loss curves are a visual representation of the direction in which the learning of a
CNN model takes place, corresponding to the experience and amount of training data it
is given. It has an exponentially improving learning rate. Figure 10 shows a variation of
loss corresponding to epochs using MobileNet V1. Figures 11 and 12 show a variation of
loss corresponding to epochs using MobileNet V1 + LSTM and MobileNet V1 + GRU, re-
spectively. The loss curves of the proposed methods can be seen to become stagnant, par-
allel and slightly away from each other as the model training comes to an end.

Figure10.
Figure 10.Variation
Variationofofloss
losscorresponding
correspondingtotoepochs
epochs(using
(usingMobileNet
MobileNetV1).
V1).

Figure 10. Variation of loss corresponding to epochs (using MobileNet V1).


Sensors 2023,23,
Sensors2023, 23,4692
x FOR PEER REVIEW 16 of
17 of 20
21
Sensors 2023, 23, x FOR PEER REVIEW 17 of 21

Figure11.
Figure 11. Variation
Variation of
of loss
loss corresponding
correspondingto
toepochs
epochs(using
(usingMobileNet
MobileNetV1
V1++ LSTM).
LSTM).
Figure 11. Variation of loss corresponding to epochs (using MobileNet V1 + LSTM).

Figure 12. Variation of loss corresponding to epochs (using MobileNet V1 + GRU).


Figure12.
Figure 12.Variation
Variationofofloss
losscorresponding
correspondingtotoepochs
epochs(using
(usingMobileNet
MobileNetV1
V1++GRU).
GRU).
4.4. Discussion
4.4.
4.4.Discussion
Discussion
Unlike standard image recognition problems, the time series data used in this study
Unlike
Unlikestandard
standardimage
imagerecognition
recognitionproblems,
problems,the thetime
timeseries
seriesdata
dataused
usedininthisthisstudy
study
had one-dimensional data, which are significantly different from two-dimensional image
had
hadone-dimensional
one-dimensionaldata,
data,which
whichare
aresignificantly
significantlydifferent
differentfrom
fromtwo-dimensional
two-dimensionalimage image
recognitionproblems.
recognition problems. Comparedwith withthe
themillions
millions ofof training samples in the field of im-
recognition problems.Compared
Compared with the millions training samples
of training in the
samples field
in the of image
field of im-
age classification,
classification, the
the data data samples used in this study were smaller, which increases
the riskthe
age classification, thesamples used in
data samples this in
used study
this were
studysmaller, which increases
were smaller, which increases of
the
risk of overfitting.
overfitting. Moreover,
Moreover, Sleep Sleep detection
Apnea Apnea detection
is a is a classification
binary binary classification
problem problem
that that
differs
risk of overfitting. Moreover, Sleep Apnea detection is a binary classification problem that
Sensors 2023, 23, 4692 17 of 20

from image recognition. The feature maps, convolution layer strides, and fully connected
layer nodes in the standard MobileNet V1 may not be suitable for this scene. Therefore,
MobileNet V1 is adjusted as follows:
1. A one-dimensional convolution operation is used instead of a two-dimensional con-
volution operation to feature extraction.
2. A dropout layer between the convolution layer and the fully connected layer is added
to avoid over-fitting.
3. Only one fully connected layer is retained so as to reduce network complexity.
4. The size of the convolution layer strides and the number of fully connected layer
nodes are modified.
Compared to the standard MobileNet V1, all convolution layer strides of our modified
MobileNet V1 were changed to two, and the number of feature maps was increased layer
by layer. In particular, a dropout layer with a drop rate of 0.5 was added between the
convolution layer and the fully connected layer. In addition, the number of output layer
nodes was reduced from 1000 to two for our binary classification problem. Table 9 contains
a comparative study of our proposed method and multiple other methods on the Physionet
Apnea-ECG database, all of which use Lead-II ECG signals. These studies use a training
and testing set of 35 patients each in alignment with our work. From the above comparison
of previous studies, we know that [8] has the highest accuracy, specificity and sensitivity of
89.4%, 89.1% and 89.8%, respectively. Whereas the proposed algorithm using MobileNet
along with GRU exhibits values higher than those in the previous studies, i.e., accuracy
(90.29%), specificity (90.72%), sensitivity (90.01%).

Table 9. A comparative study between the proposed method and multiple additional methods on the
Physionet Apnea-ECG database, using Lead-II ECG signals.

Recall/
Year Method Accuracy Specificity
Sensitivity
2017 [21] HMM 82.33 84.7 85.8
2018 [9] DNN and HMM using single lead 85 82.1 88.9
ECG Signal
2018 [18] SVM 88.2 85.7 87.2
2019 [19] AlexNet model, CNN 86.22 88.4 89
2019 [22] Time Window artificial neural 87.3 88.7 85.1
network (TW-MLP)
2020 [17] Unsupervised feature learning, single 85.1 86.2 81.4
lead ECG, HMM
2021 [8] Multiscale DNN 89.4 89.1 89.8
Proposed MobileNet V1 89.5 89 90
Model
Proposed MobileNet V1 + LSTM 90.00 90.34 89.82
Model
Proposed MobileNet V1 + GRU 90.29 90.72 90.01

5. Wearable Device Implementation (Sleepify)


After the proposed models were able to classify the ECG signals into apneic and
normal patterns at a decent and acceptable accuracy, the authors went on ahead to resolve
the troublesome and tiresome approach of PSG by designing a compact, accurate, and
lightweight wearable device named Sleepify. It could be worn by the patients with ease
during their sleep at their own homes to record their ECG signals. The device consists of
two parts: the first is ECG sensors which will record the ECG signals from the patient, and
the other part is raspberry pi pico, which will then diagnose whether the patient is suffered
from sleep apnea or not. The device will warn them of any episodes that occurred during
the night. The ECG recordings can also be concatenated (upon the patient’s consent) into a
new database comprising the annotated ECG signals. The data can be useful to the patient’s
diagnosis process and contribute to a new set of databases. For the purpose of recording
Sensors 2023, 23, x FOR PEER REVIEW 19 of 21

Sensors 2023, 23, 4692 18 of 20


to the patient’s diagnosis process and contribute to a new set of databases. For the purpose
of recording ECG signals, the signaled data received from the device must be accurate and
error-free
ECG tothe
signals, onesignaled
hundred percent.
data Due
received to this
from very reason,
the device must betheaccurate
authorsand have employed
error-free to
security methods in the device’s working; the data can be directly fetched/uploaded
one hundred percent. Due to this very reason, the authors have employed security methods
from/to
in the hospital
the device’s cloud
working; thedatabase
data caninbe
encrypted form. Data can alsofrom/to
directly fetched/uploaded be directly
themeasured
hospital
cloud database in encrypted form. Data can also be directly measured from the ECG13
from the ECG sensors and then transferred to raspberry pico to classify. Figure shows
sensors
the then
and design and various
transferred components
to raspberry picoproposed byFigure
to classify. authors
13for a secure
shows and compact
the design wear-
and various
able device. proposed
components Various tasks such asfor
by authors classification
a secure and and encryption
compact of thedevice.
wearable ECG signals
Variousfor Ap-
tasks
nea detection and security applications, respectively, are performed in the microcontroller
such as classification and encryption of the ECG signals for Apnea detection and security
named, Raspberry
applications, Pi Pico.
respectively, are performed in the microcontroller named, Raspberry Pi Pico.

Figure13.
Figure 13.Proposed
Proposeddesign
designfor
forwearable
wearabledevice.
device.

6.6.Conclusions
Conclusionsand andFuture
FutureScope
Scope
The
The study gives advancement in
study gives advancement inthe
thearea
areaofofdetection
detectionofofObstructive
ObstructiveSleep SleepApnea
Apnea
using deep learning approaches regarding accuracy, sensitivity and specificity.
using deep learning approaches regarding accuracy, sensitivity and specificity. This study This
study
proposesproposes an accurate,
an accurate, cost-effective,
cost-effective, and non-invasive
and non-invasive methodology
methodology for identify-
for identifying ob-
ing obstructive
structive sleep in
sleep apnea apnea in potential
potential patientspatients using single-lead
using single-lead ECG signal ECG signal
to train to train
MobileNet
MobileNet
V1+ LSTMV1+ andLSTM and MobileNet
MobileNet V1+ GRU, V1+ whichGRU, which are CNN-based
are CNN-based models with models with the
the integration
integration of the RNN model. The proposed model beats the state-of-the-art
of the RNN model. The proposed model beats the state-of-the-art performance of the ex- performance
ofisting
the existing
related related
studies.studies. The solution
The solution requires
requires significantly
significantly less computational
less computational power
power and
and thus can run on portable devices and return results much faster.
thus can run on portable devices and return results much faster. The proposed modelThe proposed model
(MobileNet
(MobileNetV1 V1++GRU)
GRU)achieved
achievedan anoverall
overallaccuracy
accuracyof of90.29%
90.29%using
usingthe
thesample
sampledataset:
dataset:
aaremarkably higher result than other methods and algorithms using
remarkably higher result than other methods and algorithms using the same databasethe same database
and
andECGECGinput.
input.Thus,
Thus,the
theaforementioned
aforementionedmodel modelarchitecture
architectureand andits
itshigh
highefficiency
efficiencymakemake
this
this study and its implementation viable in, for example, portable wearabledevices
study and its implementation viable in, for example, portable wearable devicestoto
detect
detectandandrespond
respondto toOSA
OSAevents
eventsin inpatients,
patients,and
andwith
withthe
theadded
addedefficiency,
efficiency,an anincrease
increase
in the speed of diagnosis and detection, given a certain amount of processing power, is
in the speed of diagnosis and detection, given a certain amount of processing power, is
naturally observed. In this proposed article, the authors have developed an architecture
naturally observed. In this proposed article, the authors have developed an architecture
for deploying a wearable device that can gather data directly from the hospital’s cloud
for deploying a wearable device that can gather data directly from the hospital’s cloud
database and utilize that data directly. Further study will involve integrating the device
database and utilize that data directly. Further study will involve integrating the device
with the hybrid deep learning model and optimizing accuracy. The authors will develop a
device that determines the patient’s obstructive sleep apnea diagnosis in real-time.
Sensors 2023, 23, 4692 19 of 20

Author Contributions: Conceptualization, P.H. and G.R.; Methodology, P.H. and V.S.D.; Formal
analysis, P.S.; Investigation, P.S.; Writing—review & editing, D.P.B.; Supervision, D.P.B.; Project
administration, D.P.B.; Funding acquisition, D.P.B. All authors have read and agreed to the published
version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data available in a publicly accessible repository. The data presented
in this study are openly available in [Apnea-ECG Database, Physionet] at [https://doi.org/10.13026
/C23W2R], reference number [13].
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Bahrami, M.; Forouzanfar, M. Sleep apnea detection from single-lead ECG: A comprehensive analysis of machine learning and
deep learning algorithms. IEEE Trans. Instrum. Meas. 2022, 71, 1–11. [CrossRef]
2. Pavlova, M.K.; Latreille, V. Sleep Disorders. Am. J. Med. 2019, 132, 292–299. [CrossRef]
3. Bahrami, M.; Forouzanfar, M. Deep learning forecasts the occurrence of sleep apnea from single-lead ECG. Cardiovasc. Eng.
Technol. 2022, 13, 809–815. [CrossRef] [PubMed]
4. Prinz, P.N.; Vitiello, M.V.; Raskind, M.A.; Thorpy, M.J. Sleep disorders and aging. N. Engl. J. Med. 1990, 323, 520–526.
5. Mcclure, K.; Erdreich, B.; Bates, J.H.T.; Mcginnis, R.S.; Masquelin, A.; Wshah, S. Classification and detection of breathing patterns
with wearable sensors and deep learning. Sensors 2020, 20, 6481. [CrossRef] [PubMed]
6. Kim, T.; Kim, J.W.; Lee, K. Detection of sleep disordered breathing severity using acoustic biomarker and machine learning
techniques. BioMed Eng. Online 2018, 17, 16. [CrossRef]
7. Song, C.; Liu, K.; Zhang, X.; Chen, L.; Xian, X. An obstructive sleep apnea detection approach using a discriminative hidden
markov model from ECG signals. IEEE Trans. Biomed. Eng. 2015, 63, 1532–1542. [CrossRef]
8. Shen, Q.; Qin, H.; Wei, K.; Liu, G. Multiscale deep neural network for obstructive sleep apnea detection using rr interval from
single-lead ECG signal. IEEE Trans. Instrum. Meas. 2021, 70, 2506913. [CrossRef]
9. Wang, L.; Lin, Y.; Wang, J. A RR interval based automated apnea detection approach using residual network. Comput. Methods
Programs Biomed. 2019, 176, 93–104. [CrossRef] [PubMed]
10. ResMed Blog Page. Available online: https://www.resmed.co.in/blogs/prevalence-sleep-apnea-india (accessed on 12 January 2023).
11. Li, K.; Pan, W.; Li, Y.; Jiang, Q.; Liu, G. A method to detect sleep apnea based on deep neural network and hidden markov model
using single-lead ECG signal. Neurocomputing 2018, 294, 94–101. [CrossRef]
12. Singh, H.; Tripathy, R.K.; Pachori, R.B. Detection of sleep apnea from heart beat interval and ECG derived respiration signals
using sliding mode singular spectrum analysis. Digit. Signal Process. 2020, 104, 102796. [CrossRef]
13. Goldbergeret, A.; Amaral, L.; Glass, L.; Hausdorff, J.; Ivanov, P.; Mark, R.; Mietus, J.; Moody, G.; Peng, C.; Stanley, H. Physionet:
Components of a new research resource for complex physiological signals. Circulation 2000, 101, 215–220.
14. Alshaer, H.; Hummel, R.; Mendelson, M.; Marshal, T.; Bradley, T.D. Objective Relationship between Sleep Apnea and Frequency
of Snoring Assessed by Machine Learning. J. Clin. Sleep Med. 2019, 15, 463–470. [CrossRef]
15. Varon, A.; Caicedo, D.; Testelmans, B.; Buyse, S.; Huffel, V. A novel algorithm for the automatic detection of sleep apnea from
single-lead ECG. IEEE Trans. Biomed. Eng. 2015, 62, 2269–2278. [CrossRef] [PubMed]
16. Zarei, A.; Beheshti, H.; Asl, B.M. Detection of sleep apnea using deep neural networks and single-lead ECG signals. Biomed.
Signal Process. Control 2022, 71, 103125. [CrossRef]
17. Feng, K.; Qin, H.; Wu, S.; Pan, W.; Liu, G. A sleep apnea detection method based on unsupervised feature learning and single-lead
electrocardiogram. IEEE Trans. Instrum. Meas. 2020, 70, 4000912. [CrossRef]
18. Surrel, G.; Aminifar, A.; Rincón, F.; Murali, S.; Atienza, D. Online obstructive sleep apnea detection on medical wearable sensors.
IEEE Trans. Biomed. Circuits Syst. 2018, 12, 762–773. [CrossRef]
19. Singh, S.A.; Majumder, S. A novel approach osa detection using single-lead ECG scalogram based on deep neural network. J.
Mech. Med. Biol. 2019, 19, 1950026. [CrossRef]
20. Stretch, R.; Ryden, A.; Fung, C.H.; Martires, J.; Liu, S.; Balasubramanian, V.; Saedi, B.; Hwang, D. Predicting nondiagnostic home
sleep apnea tests using machine learning. J. Clin. Sleep Med. 2019, 15, 1599–1608. [CrossRef]
21. Gutta, S.; Cheng, Q.; Nguyen, H.D.; Benjamin, B.A. Cardiorespiratory model-based data-driven approach for sleep apnea
detection. IEEE J. Biomed. Health Inform. 2017, 22, 1036–1045. [CrossRef]
22. Wang, T.; Lu, C.; Shen, G. Detection of sleep apnea from single-lead ECG signal using a time window artificial neural network.
BioMed Res. Int. 2019, 2019, 9768072. [CrossRef] [PubMed]
23. Bozkurt, F.; Uçar, M.K.; Bozkurt, M.R.; Bilgin, C. Detection of abnormal respiratory events with single channel ECG and hybrid
machine learning model in patients with obstructive sleep apnea. IRBM 2020, 41, 241–251. [CrossRef]
Sensors 2023, 23, 4692 20 of 20

24. Liang, X.; Qiao, X.; Li, Y. Obstructive sleep apnea detection using combination of cnn and lstm techniques. In Proceedings of
the 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China,
24–26 May 2019; pp. 1733–1736.
25. Nguyen, H.D.; Wilkins, B.A.; Cheng, Q.; Benjamin, B.A. An online sleep apnea detection method based on recurrence quantifica-
tion analysis. IEEE J. Biomed. Health Inform. 2013, 18, 1285–1293. [CrossRef] [PubMed]
26. Kingma, P.; Ba, J.L. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning
Representations, ICLR 2015-Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015.
27. Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747.
28. Hwang, S.H.; Lee, Y.J.; Jeong, D.U.; Park, K.S. Apnea-hypopnea index prediction using electrocardiogram acquired during the
sleep-onset period. IEEE Trans. Biomed. Eng. 2016, 64, 295–301.
29. Bsoul, M.; Minn, H.; Tamil, L. Apnea medassist: Real-time sleep apnea monitor using single-lead ECG. IEEE Trans. Inf. Technol.
Biomed. 2010, 15, 416–427. [CrossRef]
30. Agarap, A.M.F. Deep Learning using Rectified Linear Units (ReLU). arXiv 2018, arXiv:1803.08375.
31. Mostafa, S.S.; Mendonça, F.; Ravelo-García, A.G.; Morgado-Dias, F. A systematic review of detecting sleep apnea using deep
learning. Sensors 2019, 19, 4934. [CrossRef]
32. Urtnasan, E.; Park, J.U.; Lee, K.J. Automatic detection of sleep-disordered breathing events using recurrent neural networks from
an electrocardiogram signal. Neural Comput. Appl. 2018, 32, 4733–4742. [CrossRef]
33. Narkhede, M.V.; Bartakke, P.P.; Sutaone, M.S. A review on weight initialization strategies for neural networks. Artif. Intell. Rev.
2022, 55, 291–322. [CrossRef]
34. Rusiecki, A. Trimmed categorical cross-entropy for deep learning with label noise. Electron. Lett. 2019, 55, 319–320. [CrossRef]
35. Barba-Guaman, L.; Eugenio Naranjo, J.; Ortiz, A. Deep Learning Framework for Vehicle and Pedestrian Detection in Rural Roads
on an Embedded GPU. Electronics 2020, 9, 589. [CrossRef]
36. Ademola, O.A.; Leier, M.; Petlenkov, E. Evaluation of Deep Neural Network Compression Methods for Edge Devices Using
Weighted Score-Based Ranking Scheme. Sensors 2021, 21, 7529. [CrossRef] [PubMed]
37. Nganga, K. Building A Multiclass Image Classifier Using MobilenetV2 and TensorFlow. Eng. Educ. (Eng. Ed.) Program. 2022.
Available online: https://www.section.io/engineering-education/building-a-multiclass-image-classifier-using-mobilenet-v2
-and-tensorflow (accessed on 14 February 2023).
38. Srinivasu, P.N.; Sivasai, J.G.; Ijaz, M.F.; Bhoi, A.K.; Kim, W.; Kang, J.J. Classification of skin disease using deep learning neural
networks with mobilenet v2 and LSTM. Sensors 2021, 21, 2852. [CrossRef]
39. Widjaja, C.; Varon, A.; Dorado, J.A.; Suykens, S.; Huffel, V. Application of kernel principal component analysis for single-lead-
ECG-derived respiration. IEEE Trans. Biomed. Eng. 2012, 59, 1169–1176. [CrossRef]
40. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; Goodfellow, I., Bengio, Y., Courville, A., Eds.; MIT Press: Cambridge, CA,
USA, 2016; p. 516.
41. Funahashi, K.I.; Nakamura, Y. Approximation of dynamical systems by continuous time recurrent neural networks. Neural Netw.
1993, 6, 801–806. [CrossRef]
42. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [CrossRef]
43. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
44. Al-Angari, H.M.; Sahakian, A.V. Automated recognition of obstructive sleep apnea syndrome using support vector machine
classifier. IEEE Trans. Inf. Technol. Biomed. 2012, 16, 463–468. [CrossRef]
45. Zaremba, W.; Sutskever, I. Learning to execute. arXiv 2014, arXiv:1410.4615.
46. Yang, W.; Fan, J.; Wang, X.; Liao, Q. Sleep apnea and hypopnea events detection based on airflow signals using LSTM network. In
Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC),
Berlin, Germany, 23–27 July 2019.
47. Erdenebayar, U.; Kim, Y.J.; Park, J.-U.; Joo, E.Y.; Lee, K.-J. Deep learning approaches for automatic detection of sleep apnea events
from an electrocardiogram. Comput. Methods Programs Biomed. 2019, 180, 105001. [CrossRef] [PubMed]
48. Ferri, C.; Hernández-Orallo, J.; Modroiu, R. An experimental comparison of performance measures for classification. Pattern
Recognit. Lett. 2009, 30, 27–38. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like