1. Introduction
Obstructive sleep apnea (OSA) is a sleep disorder with a high prevalence among the general population, but remains heavily underdiagnosed [
1,
2]. It is characterized by breathing disturbances during sleep, causing hypoxemia, large chest motions, and arousals from sleep, which fragment the patient’s sleep and reduce phases of rapid eye movement (REM) and slow wave sleep. Consequently, OSA is a recognized risk factor for excessive daytime sleepiness, hypertension, and cardiovascular diseases [
3]. Its severity is measured by the Apnea–Hypopnea Index (AHI), i.e., the number of respiratory events per hour of sleep. In order to determine the AHI and diagnose sleep apnea, an overnight sleep study is required. A patient is then categorized as not suffering from OSA if
AHI
, as mild OSA if
AHI
with presence of symptoms, moderate OSA if
AHI
, or severe OSA if AHI ⩾ 30 [
4]. The procedure is, however, expensive, as well-trained staff and a multitude of sensors are required. These sensors include, among others, electrocardiography (ECG), respiratory inductance plethysmography (RIP), oxygen saturation, electroencephalogram, electrooculogram, and electromyogram. As such, the extensive setup also poses a high level of discomfort for the patient, which inhibits a normal sleeping pattern.
To enable screening within the general population and a long-term follow up of patients, home-monitoring systems are required. Many emerging unobtrusive sensor technologies for sleep-related monitoring are based on cardiac and respiratory signals. Firstly, these signals carry information about the different sleep stages and occurring apneas, in the form of patterns of bradycardia, tachycardia and respiratory disturbances. Secondly, these can be acquired more comfortably as opposed to the electroencephalogram and oxygen saturation. Huysmans et al. developed a sleep–wake classification network for OSA patients based on convolutional neural networks (CNN) [
5]. In addition, the predicted hypnogram served as input for an OSA patient detection model. Both the network and model were exclusively trained on ECG and RIP data from PSG. Nevertheless, these were specifically designed to process single short-term segments, as unobtrusive data is likely to be affected by artefacts and data loss. For long-term monitoring, capacitively-coupled ECG (ccECG) and capacitively-coupled bioimpedance (ccBioZ) are likely to be suitable modalities as these sensors do not require a direct contact to the skin. Instead, the skin and electrode serve as electric conductors with any clothing and fabric as the dielectric medium, to generate a capacitive coupling. In this study, the ccECG and ccBioZ sensors integrated in a mattress, previously presented in [
6], were used. Few sleep monitoring studies have been performed with these sensor types. Lee et al. tested the potential of their ccECG recording system for detection of REM sleep and wakefulness through heart rate variability parameters [
7]. Furthermore, Kido et al. classified the sleeping position from ccECG as a step towards personal healthcare [
8]. Deviaene et al. applied the ccECG and ccBioZ on an apnea detection algorithm [
9].
This study has three objectives, of which the first one is related to the technology evaluation and the remaining two to the clinical application. The technology objective is to assess the quality of the ccECG and ccBioZ on a larger patient database compared to the one presented in [
10]. Specifically, there is a focus to further understand the performance of the ccBioZ signal by applying recently developed signal quality indicators (SQIs) [
11] and performing a comparison against the respiratory PSG signal. The first application objective is to further develop the sleep–wake classifier of [
5] to be used with ccECG and ccBioZ. The original classifier is first applied on the new data and finetuned afterwards. The final classification outcome based on the capacitively-coupled data is used to predict the patient’s sleep architecture or “hypnogram”. However, it is hypothesized that the restlessness and movements induced by apneas of OSA patients will heavily affect the unobtrusive recordings and deteriorate the performances of sleep–wake classification. To the best of our knowledge, this is the first study to investigate automated sleep–wake prediction in suspected OSA patients based on ccECG and ccBioZ. The second application objective is to exploit the unobtrusively predicted hypnogram to detect patients with an increased risk of suffering from OSA. In this way, patients can be prioritized directly for a clinical diagnostic test. Hence, this study addresses the need for unobtrusive monitoring of suspected OSA patients.
2. Materials
The dataset consisted of 187 patients with suspected sleep apnea. They underwent a diagnostic PSG at the sleep laboratory of the University Hospitals Leuven (UZ Leuven, Belgium). The ethical committee of UZ Leuven approved the data collection (S60319) and all patients signed an informed consent form. The B3IP device from Medatec (Haillot, Belgium) served as polysomnograph and provided data from the built-in ECG (SPES electrodes) and built-in thoracic RIP (SleepSense belts). The Medatec Brainnet Winacq 5.0 and Medatec Brainnet Winrel 5.0 were the acquisition and the analyzing software, respectively. The ECG and RIP were sampled at 500 Hz. Based on the PSG data, a clinical sleep expert annotated the sleep stages and apneas according to the AASM 2012 scoring rules [
12].
The complete patient dataset was split into subsets for network optimization and independent model testing. Training of the sleep–wake classifier was performed on 39 patients (CNN_Train), with validation on 17 patients (CNN_Val) and testing on 26 patients (CNN_Test). These datasets only included patients with AHI < 10, and the use of these datasets is further explained in
Section 3.3. Next, dataset CNN_Test was merged with patients with higher AHI and split again according to OSA severity in the subsets No (AHI < 5), Mild (5 ⩽ AHI < 15), Mod (15 ⩽ AHI < 30), and Sev (AHI ⩾ 30). These datasets were applied for testing the sleep–wake classifier and model optimization for OSA patient detection.
Out of 187 patients, 40 patients were part of an additional data collection, complying with the same ethical standards. Their PSG recordings were acquired simultaneously with ccECG and ccBioZ sensors embedded in a mattress, with the setup illustrated in
Figure 1. The ccECG and ccBioZ were acquired as described in [
6], at a sampling frequency of 512 Hz. Due to technical or handling problems (e.g., power cable disconnection), data was correctly collected on 36 patients. This data was then used both to perform a technology evaluation in combination with signal quality indicator (SQI) algorithms, as well as to perform an independent validation of the sleep–wake classification and detection of OSA patients. Because of this, these 36 patients were left out as unseen dataset test. The four PSG recordings of patients with missing capacitively-coupled data were ignored to allow statistical tests with repeated measurements within the Test dataset. As such, dataset Test contained 6 non-apneic patients, 4 mild, 9 moderate, and 17 severe OSA patients. The complete overview of datasets can be found in
Table 1.
3. Methods
The PSG and capacitively-coupled signals were first preprocessed by synchronization between the modalities and determination of high-quality data segments. ccECG and ccBioZ data were then evaluated in terms of SQI-based coverage and the agreement of extracted features (e.g., beat-to-beat heart rate, respiration rate) with the gold standard PSG signal. Following this, the data were normalized prior to feeding the CNN (
Section 3.1). Then, the performance of the original sleep–wake classifier was tested on the unobtrusive data (
Section 3.2). This was followed by a classifier improvement step using data augmentation (
Section 3.3). Finally, patients at risk of OSA were detected using indices derived from their predicted sleep–wake pattern. This prediction was based on their capacitively-coupled recordings (
Section 3.4). The full pipeline of the study is illustrated in
Figure 2.
3.1. Data Preprocessing
3.1.1. Quality Assessment and Technology Evaluation against Gold Standard
After acquisition of the raw ccECG and ccBioZ, the signals were processed to provide signals of the highest possible quality. This phase was mainly based on the work of Castro et al. in [
13,
14] for ccECG signals and of Albaba et al. [
11] for ccBioZ signals.
ccECG: The ccECG was acquired by a multi-channel system, producing four simultaneous ccECG channels [
6]. These were all synchronized with the reference ECG and subsequently split into 30 s segments (or “epochs”). For every epoch, the highest quality channel was selected. In this way, a single aggregated ccECG signal was provided [
13]. Next, every segment of the aggregated ccECG was either labeled as high or low quality using a high-threshold SVM model [
14]. Technology evaluation for the ccECG signals comprised determining the SQI-based coverage as the percentage of high-quality data, followed by a feature comparison against the information extracted from the PSG ECG. This included beat detection sensitivity, tachogram correlation, and R-R interval errors, with R-R the time interval between two consecutive beats. In addition, coverage metrics were compared against a previous dataset described in [
10].
ccBioZ: The ccBioZ was a single channel signal [
6], synchronized to the reference RIP and segmented using a 30 s window. The technology evaluation of the ccBioZ signals aimed to evaluate an improved electronic circuit for acquisition of the respiratory activity. ccBioZ signals were therefore evaluated in terms of their SQI-based coverage. The same approach was taken as for ccECG, but with application of the recently developed ccBioZ SQI algorithms (presented in [
11]). Furthermore, both the SQI-preprocessed data as well as the complete dataset was compared against the gold standard PSG RIP signal. This included metrics of respiration rate error (both in frequency and time domain) as well as the average correlation per patient of the ccBioZ signal with the PSG RIP signal. Finally, this was compared against the data in [
10] applying the same processing. It is noted that the current study included more patients than prior data collection in [
10].
3.1.2. Sleep–Wake Classification
The subsequent preprocessing was specifically designed for sleep–wake classification using a dedicated CNN classifier proposed in [
5].
(cc)ECG: Heart beats were detected in 30 s (cc)ECG segments, using the method proposed in [
15]. Segments with less than 15 detected beats were removed. In each remaining segment, the first beat was slightly adjusted in time by linear extrapolation, as there could be a relatively wide time gap between the segment start and the first detected beat. This possibly causes border problems during interpolation in a later step. Thus, the first beat time was defined as the second beat time minus the mean of the second and third beat interval. The same border effect could occur at the last beat as well. Thus, its time was redefined as the second last beat time added with the mean of the second and third last beat interval. Next, the heart rate (HR) at every beat was extracted. The HR was calculated in beats per minute using HR = 60/R-R. The HR values were subsequently interpolated at 4 Hz to generate a tachogram. Next, HR outliers were identified whenever the HR was outside the range of 40 to 180 BPM, outside the segment’s median value ± 20 BPM, or outside the segment’s median value ± (3 × the segment’s standard deviation (SD)). Next, the outliers were corrected as long as the duration was smaller or equal to 2.5 s. Segments with outliers of longer duration were removed. This outlier gap was filled by mirroring the values preceding the gap [
16]. This was followed by subtracting the signal with the patient’s overall median BPM value.
BioZ/RIP: The 30 s BioZ or RIP segments were bandpass filtered using a Butterworth filter with cutoff frequencies at 0.04 Hz and 2 Hz. Segments were then downsampled to 4 Hz. Every segment was then normalized by subtraction of the patient’s overall median value and division by the patient’s overall SD. Next, the individual median per segment was subtracted.
3.2. CNN-Based Sleep–Wake Classification
The multimodal sleep–wake network proposed in [
5] was based on a CNN consisting of two branches, illustrated in
Figure 3. One branch received the 30 s tachograms and the other branch the filtered 30 s respiratory effort segments. Every modality could also serve separately as an input to the corresponding unimodal CNN. Training of these networks was performed on PSG recordings. To verify the generalization capability of the unimodal networks on the novel capacitively-coupled modalities, the ccECG was fed to the original cardiac network and the ccBioZ to the respiratory network. For this, dataset Test was applied, which included 36 capacitively-coupled recordings (see
Table 1). Only the high-quality segments were taken into account, as an accurate heart beat detection was important for the ccECG and to have clean waveforms for the ccBioZ. It was expected that the cardiac network generalizes well, as the tachogram of the ccECG should be similar to the reference [
13]. The respiratory network, on the other hand, was based on waveforms. This potentially causes a loss in performance, as different wave characteristics and different amplitude changes can be expected in the ccBioZ signal.
3.3. RIP Network Augmentation
Since a lower generalization of the respiratory model was expected, normalization of the respiratory signal was redefined and the RIP network was trained with data augmentation. Finally, the augmented RIP network was merged with the ECG network to create a multimodal network and to investigate performance differences.
3.3.1. Input Normalization
The normalization of the RIP signal during preprocessing (see
Section 3.1.2) ensured that the signal had a standard deviation of one, with a median around zero. This ensured the stability of the CNN learning process and a similar influence of all data segments. However, it does not prevent the network from learning sleep–wake patterns based on the amplitude of the input segment. For example, during deep sleep, the patient’s breathing is deeper and generates an RIP signal with larger amplitude. However, the relative change relies heavily on the type of sensor used. In addition, if the ccBioZ data contained many artefacts, the overall SD for normalization could become very large. Normalizing the signal by division with the SD would then result in a small signal amplitude. Therefore, a
min-max normalization scheme could be chosen, with scaling of
every segment between 0 and 1. This garantueed that the ccBioZ segments had a similar amplitude range to the RIP segments. As a zero median is preferred for CNN training, a scaling between −0.5 and 0.5 was performed.
3.3.2. Data Augmentation
To further reduce the amplitude dependency and to increase the training set size, the CNN was trained with data augmentation. First, the recording of every patient was duplicated four times. Next, with every training step, each 30 s data segment was scaled with a random factor between 0.2 and 3.2. These ranges were chosen to keep the amplitude between reasonable ranges. This procedure should increase the number of learning examples and reduce the dependency of the network to changes in amplitude.
3.3.3. RIP Network Retraining
The RIP network was retrained with this data augmentation procedure and using the PSG recordings from patients with relatively low AHI (AHI < 10, as defined in [
5]). This forced the network to learn patterns related to sleep stages, and not those related to apneas. CNN_Train contained 70% (
) of the data subset for CNN weight training and CNN_Val contained 30% (
) for validation, with
N being the number of patients. The splitting of the dataset was repeated ten times, using a different seed for randomization, to train ten RIP networks. The network with the highest Cohen’s Kappa score (
) on a separate test set (CNN_Test) was selected. This dataset only included patients with an AHI < 10, but the selected network was also tested on patients with higher AHI. For this, CNN_Test was merged with the remaining 65 recordings and split again according to the conventional OSA classes into datasets No, Mild, Mod, and Sev (see
Table 1).
3.3.4. Multimodal Network
The selected augmented RIP network was merged with the ECG network to evaluate the contribution of the latter to the performance of the RIP network. First, the ECG network was trained using the same split for CNN_Train and CNN_Val. Then, the trained convolutional layers of the RIP and ECG network were combined by dense layers (see
Figure 3). The weights of the convolutional layers were frozen, and the dense layers were retrained by CNN_Train and CNN_Val. The same workflow as described in [
5] was used. A Wilcoxon signed rank test verified the performance differences between classifiers or datasets, which is a non-parametric test for repeated measurements.
3.4. Detection of OSA Patients
In [
5], it was shown that sleep–wake predictions become uncertain in the presence of apneas as the network is trained on patients with relatively few apneas. In addition, apneas disturb the characteristic sleep patterns affecting both the cardiac and respiratory signals. As a result, the percentage of uncertain sleep predictions of the classifier is related to the severity of OSA. Thus, a discrimination was made between
confident and
uncertain predicted epochs. The probability of wakefulness p(wake) for a wake-predicted epoch should reach the threshold
to be labeled confident.
was defined by median p(wake) of epochs predicted as wake minus its SD, calculated over the patients in CNN_Test. Similarly, p(sleep) of confident sleep predicted epochs should exceed
, which was the median p(sleep) of epochs predicted as sleep minus its SD. After defining these thresholds based on the predicted PSG recordings of CNN_Test, the percentage of uncertain sleep epochs in the ccBioZ was calculated. This was the first index for OSA patient detection as proposed in [
5]. A second index was the percentage of sleep–wake and wake–sleep transitions, as there was an expected increase in sleep fragmentation, sympathetic activation, and micro-awakenings.
For simplicity and similarly to previous study ([
5]), it was chosen that if
at least one of both metrics exceeded a selected threshold, the patient was identified as being at risk of OSA (AHI ⩾ 15), i.e., detected positive. These thresholds were selected by means of an ROC analysis, as the values that correspond to a high specificity for OSA detection. The goal was to detect moderate and severe OSA patients at home using the unobtrusive sensors, so they could be prioritized for a diagnostic PSG.
The ccBioZ can be very sensitive to movement due to its unobtrusive nature. This can reduce the signal quality in epochs with increased movement. This means that more epochs with a lower quality can be expected for OSA patients, due to their general restlessness and apneas that evoke large body movements when breathing is restored. In addition, apneas disrupt the regularity of the respiratory signal due to breathing cessations and the cardiac signal due to tachycardia and bradycardia. Thus, the effect of apneas causing (false) wake predictions is expected to be enhanced by respiration monitoring with a capacitively-coupled device.
As a benefit, apnea segments were hypothesized to be informative for detection of patients with high OSA risk. However, when using only high-quality ccBioZ segments, it is expected that only apneas with more subtle physiological reactions are retained. This idea is supported by Deviaene et al. in [
9], who observed a decreased sensitivity for apnea detection when using only high-quality capacitive data. Thus, the procedure for OSA patient detection was assessed with the full ccBioZ recordings (i.e., including both high and low quality segments). The performance was compared to the high-quality ccBioZ dataset by means of the Cohen’s Kappa score (
), accuracy (Acc), sensitivity (Se), specificity (Sp), and diagnostic odds ratio (DOR). The DOR is a single metric of performance, independent of the prevalence and is defined as DOR
.