research-article

Open access

Leveraging Attention-reinforced UWB Signals to Monitor Respiration during Sleep

Authors:

Haiqin LiuAuthors Info & Claims

ACM Transactions on Sensor Networks, Volume 20, Issue 5

Article No.: 108, Pages 1 - 28

https://doi.org/10.1145/3680550

Published: 26 August 2024 Publication History

PDF eReader

Abstract

The respiration state during overnight sleep is an important indicator of human health. However, existing contactless solutions for sleep respiration monitoring either perform in controlled environments and have low usability in practical scenarios or only provide coarse-grained respiration rates, being unable to accurately detect abnormal events in patients. In this article, we propose Respnea, a non-intrusive sleep respiration monitoring system using an ultra-wideband device. Particularly, we propose a profiling algorithm, which can locate the sleep positions in non-controlled environments and identify different subject states. Further, we construct a deep learning model that adopts a multi-head self-attention mechanism and learns the patterns implicit in the respiration signals to distinguish sleep respiration events at a granularity of seconds. To improve the generalization of the model, we propose a contrastive learning strategy to learn a robust representation of the respiration signals. We deploy our system in hospital and home scenarios and conduct experiments on data from healthy subjects and patients with sleep disorders. The experimental results show that Respnea achieves high temporal coverage and low errors (a median error of 0.27 bpm) in respiration rate estimation and reaches an accuracy of 94.44% on diagnosing the severity of sleep apnea-hypopnea syndrome.

1 Introduction

The respiration state during sleep, especially whether sleep disorders occur, is an important indicator for monitoring physical and mental health [7]. Sleep apnea is a typical sleep disorder in which breathing is briefly and repeatedly interrupted for over 10 seconds during sleep. When sleep apnea events occur over a certain frequency, they can develop into sleep apnea-hypopnea syndrome (SAHS) [29] and threaten human health. Previous research has shown that SAHS has been related to many other diseases, including diabetes, hypertension, heart disease, depression, and obesity [32, 40, 43]. It is estimated that nearly 1 billion people worldwide have sleep apnea, which is 10 times larger than previous estimates [41]. Therefore, there is great demand for in-home sleep respiration monitoring, which can not only offer the support of preliminary diagnosis and early warning of sleep apnea but also be used for follow-up of patients with SAHS so as to track the disease progress since their last visits to hospitals.

Traditionally, polysomnography (PSG) is clinically used to diagnose SAHS. Based on the overnight PSG data of a subject, medical technicians can diagnose different respiration events, including central apnea, obstructive apnea, mixed apnea, and hypopnea events. Then apnea-hypopnea index (AHI) is calculated to diagnose the severity of SAHS. However, such PSG test is cumbersome, expensive, and unsuitable for in-home use. Fortunately, the past few years have witnessed a surge of development in contactless sensing. Based on the fact that respiration is closely related to chest displacements and wireless signals reflected by the human body can capture subtle body movements, researchers have been exploiting various wireless signals (including Wi-Fi [58, 62], RFID [51, 63], and acoustic signals [9, 13, 16]) to monitor respiration rates or sleep disorders. Compared to the sensing by wearable devices (e.g., smart watch [11, 19, 42] and chest belt [20, 33, 37]), contactless sensing is non-intrusive and easy to deploy at home. It reduces the discomfort of long-term device contact and lowers the chance that data will become unavailable due to incorrect wear.

However, existing solutions to contactless sleep respiration monitoring suffer from several limitations. First, most studies are performed in controlled scenarios and sensitive to environmental changes. Second, some research is either conducted on healthy subjects or simulated data (e.g., holding one’s breath consciously to mimic sleep apnea). Obviously, there exists a substantial discrepancy between these data and data from real patients. For example, respiration events occurring during the sleep of patients have different types and duration, which leads to different manifestations of respiration signals and greatly increases the complexity of detecting respiration events. Recent work on sleep respiration monitoring adopts data from patients. Unfortunately, the detection accuracy is far from satisfactory. Therefore, accurate sleep respiration monitoring in non-controlled environments is still challenging.

In this article, we propose Respnea, a non-intrusive sleep respiration monitoring system using a ultra-wideband (UWB) device. Respnea enables in-home use to monitor respiration rates and respiration events during the overnight sleep of subjects in a fine-grained way. The main contributions of this article are as follows.

—

We propose an overnight respiration profiling algorithm. By leveraging both amplitude and phase information from UWB signals, we first locate the position of the bed, identify the subject states and sleep duration, and then estimate the respiration rates, thus enabling reliable sleep respiration monitoring in unknown practical scenarios.

—

We design a deep learning model with the multi-head self-attention mechanism and a contrastive learning module to differentiate sleep apnea and hypopnea states from normal respiration states. We further incorporate a multi-window voting mechanism to rectify the incorrect respiration states, and aggregate these states into sleep apnea or hypopnea events.

—

We conduct extensive experiments in both in-hospital and in-home scenarios to evaluate the performance of Respnea. The experimental results show that Respnea achieves a median error of 0.27 bpm in respiration rate estimation and an accuracy of 94.44% on diagnosing the SAHS severity, outperforming the three baselines.

The rest of the article is organized as follows. Section 2 introduces the related work. Section 3 describes the design of Respnea in detail. Section 4 gives the experimental evaluation. Last, the article is concluded in Section 5.

2 Related Work

PSG is clinically used to diagnose SAHS. PSG is composed of multiple sensors: a nasal pressure transducer to measure airflow, chest and abdomen belts to measure thoracic and abdominal respiration motions, a pulse oximeter to measure oxygen saturation, and several electroencephalogram (EEG) sensors to measure brain activity, and so on. Based on the overnight PSG data of a subject, medical technicians can diagnose respiration events. Here, EEG data are used to determine whether the subject is asleep or not. Data of airflow, oxygen saturation, and thoracic and abdominal respiration motions are used to detect different respiration events, including central apnea, obstructive apnea, mixed apnea, and hypopnea events. Then AHI can be calculated by these detected respiration events and the total sleep time. All of these can help doctors diagnose the severity of SAHS of the subject. The process of a PSG test requires operations by medical professionals. Therefore, PSG is only used in hospitals, and cannot be applied to in-home scenarios. In addition, the cost for one overnight PSG test is usually expensive.

As an alternative solution to the PSG test, wearable sensors like photoplethysmography (PPG) based devices have been applied in respiration and sleep monitoring. For example, OptiBreathe [39] is an on-device earable system for respiration monitoring, which employs multiple signal processing algorithms to measure respiration rates and breathing phases from PPG signals. Ravichandran et al. [36] propose an encoder-decoder architecture utilizing residual blocks to perform the task of extracting the respiration signal from a given PPG input. Nabavi et al. [30] propose an intraoral PPG-based sleep monitoring system to measure PPG signals from the oral cavity and estimate AHI. Hahm et al. [18] propose a sleep apnea identification system via PPG signals, in which they identify the obstructed sleep apnea in the frequency domain analysis that compares the existence of frequency components between normal and abnormal respiration in real time. Massie et al. [28] extract the respiratory information embedded in the finger PPG data, and train an ensemble of tree classifiers that predicts the central or obstructive nature of each respiratory event. Wei et al. [49] propose a sleep apnea detection method combining a multi-scale one-dimensional convolutional neural network (CNN) and a shadow one-dimensional CNN based on dual-channel input from PPG signals. However, these devices still bring discomfort to the users for long-term contact with the skin, further affecting the sleep quality of the users. Meanwhile, the data may become unavailable due to incorrect wear. In addition, since some devices such as PPG sensors rely on measuring light absorbance through the skin, they are known affected by skin color and obtain unstable monitoring performance in subjects with dark skin [14].

In recent years, contactless sensing, as a newly-emerging technology, has provided a number of solutions for vital sign monitoring [12, 17, 46, 60, 64]. For example, Raheel et al. [34] exploit the UWB signals to detect the respiration rates and heart rates of subjects when they are still. Yang et al. [52] use millimeter wave (mmWave) signals for vital sign monitoring, in which the mmWave signals can be directed towards the human’s body and the received signal strength (RSS) of the reflections can be analyzed for accurate estimation of breathing and heart rates. Adib et al. [5] exploit the fast Fourier transform based methods to capture the vital signs from the frequency modulation continuous wave signals in the home scenario. Pi-ViMo [59] is a physiology-inspired vital sign monitoring system using mmWave signals that employs a template matching method to extract human vital signs by adopting physical models of respiration and cardiac activities. Xu et al. [50] leverage audio devices on smartphones to collect acoustic signals and propose a generative adversarial network to generate fine-grained breathing waveforms from the extracted breathing patterns in respiration signals. Wang et al. [45] reconstruct the respiration and heartbeat signals by jointly optimizing the decomposition of all the extracted superposed vital signals over different range-azimuth bins, therefore obtaining fine-grained vital signs from the millimeter wave signals. Further, we note that some work aims to monitor overnight vital signs and detect respiration events, where the monitored subjects are asleep.

For overnight vital sign monitoring, WiFi-Sleep [55] adopts the fine-grained channel state information of Wi-Fi signals for monitoring respiration rates. Liu et al. [26] exploit Wi-Fi signals to estimate the overnight respiration rates for one-person and two-person in-bed cases. Hussain et al. [21] attach RFID tags to the shirt of the subject at the abdominal position, so as to enable respiration monitoring throughout the night. DoppleSleep [35] employs a commercial 24-GHz short-range Doppler continuous wave radar to continuously track human vital signs, enabling real-time and efficient sleep monitoring. Li et al. [25] take use of a UWB device to monitor not only the respiration rates during sleep but also the respiration depths and some respiration patterns. Yue et al. [56] propose DeepBreath, an RF-based respiration monitoring system that can recover the breathing signals of multiple individuals, in which they model interference due to multiple reflected RF signals and demonstrate that the original breathing can be recovered via independent component analysis. Zhang et al. [61] exploit the time-domain auto correlation function to estimate breathing rates, in which they perform maximal ratio combining to combine multiple subcarriers of Wi-Fi signals to maximize breathing signals optimally and achieve respiratory rate estimation at home. WiResP [47] is a WiFi-based respiration monitoring system that combines both instantaneous and time-domain information to improve the detection of respiration during sleep.

For respiration event detection, existing solutions can be divided into two categories: rule-based solutions and model-based solutions. Rule-based solutions detect sleep apnea events via hand-crafted rules based on the waveform morphology of the events. For example, TagBreathe [48] captures apnea events by leveraging the changes in RFID signals. UbiBreathe [4] applies the discrete wavelet transform to extract the hidden breathing signals from the noisy Wi-Fi signals, and detects apnea events by monitoring the changes in RSS. However, these work on apnea detection serves the simulated datasets, where only central apnea events can be mimicked by holding the breath. Some later work adopts data from real overnight scenarios. For example, Li et al. [25] leverage the amplitude changes of UWB signals to capture central apnea events during sleep. mmVital [53] measures the distance between peaks and the amplitude of the reflected RSS of mmWave signals to detect central apnea and hypopnea events. ApneaApp [31] transforms a phone into an active sonar system emitting sound signals and identifies central apnea, obstructive apnea, and hypopnea events by detecting peaks of reflected signals. However, simple hand-crafted rules cannot deal with complex respiration events from different patients. However, model-based solutions develop different machine learning models to improve the performance of sleep apnea detection. For example, Koda et al. [24] apply a support vector machine algorithm to the 24-GHz mmWave signals for detecting central and mixed sleep apnea events. Chen et al. [10] calculate features including physical, respiration, heartbeat, and movement features from mmWave signals, and apply them for sleep apnea classification via the ensemble subspace k-nearest neighbors. Kang et al. [22] design a hybrid neural network consisting of CNNs and long short-term memory (LSTM) networks to detect snoring and obstructive sleep apnea. Romero et al. [38] analyze sleep breathing sounds recorded with a smartphone at home, and apply a CNN model for sleep apnea screening. However, these solutions adopt simple feature extraction, which cannot deal with various complicated sleep apnea events in real patients. What is worse, they adopt coarse-grained labels that will lead to underestimation of the number of sleep apnea events. Besides, most studies on sleep monitoring are performed in controlled environments, which makes them impractical for in-home scenarios.

In comparison to existing solutions, Respnea combines advanced signal processing techniques and a deep learning model, and can accurately detect multiple respiration events in non-controlled environments.

3 Design of Respnea

3.1 Overview

To monitor the overnight respiration of a subject, we place a UWB device beside the bed where the subject sleeps (e.g., on a nightstand).

The UWB device continuously transmits pulse signals of fixed frequency \(f_s\) at a certain interval (i.e., pulse repetition interval (PRI)), and collects the signals reflected off subjects in the environment. The reflected signals are down-converted to two baseband signals, I and Q, which are recorded in a two-dimensional (2D) complex matrix \(\boldsymbol {R}\) whose element contains the amplitude and phase of the reflected signals. As shown in Figure 1, the row of matrix \(\boldsymbol {R}\), also named as “fast-time” dimension, contains reflected pulse responses with different time delays during a PRI, denoting different distances between the device and different subjects. The column, also named as “slow-time” dimension, updates every PRI, denoting the timestamps.

Fig. 1.

\(\boldsymbol {R}\) can be segmented into multiple submatrices using a sliding window (whose size is \(\theta\)) with a fixed sliding step (whose size is \(\delta\)). While both \(\theta\) and \(\delta\) are set to 30 seconds, the resulting submatrices are denoted as \(\boldsymbol {R_m}, m=1, 2, \ldots\). While \(\theta\) is set to 30 seconds and \(\delta\) is set to 10 seconds, the resulting submatrices are denoted as \(\boldsymbol {S_m}, m=1, 2, \ldots\).

Respnea consists of two components: overnight respiration profiling (with \(\boldsymbol {R_m}\) as input) and respiration event detection (with \(\boldsymbol {S_m}\) as input), as shown in Figure 2. The overnight respiration profiling component contains respiration signal extraction, bed positioning, subject state identification, respiration rate estimation, and sleep onset and exit detection, where the subject state is classified into five categories, as shown in Table 1. The respiration event detection component includes sleep state classification, respiration event aggregation, and AHI estimation and SAHS diagnosis, where the sleep state is classified into normal respiration, sleep apnea, and hypopnea states.

Table 1.

Subject State	Description
State I	The subject is not in the bed.
State II	The subject is in the bed, but there exists large interference outside the bed.
State III	The subject is in the bed, and body motions occur, involving obvious motions (State III-a) and moderate motions (State III-b), where State III-a refers to torso motions (e.g., turnovers, going to bed, getting out of bed, etc.) and State III-b refers to limb motions (e.g., leg motions, hand motions, head motions, etc.).
State IV	The subject is in the bed, stationary, with regular and detectable respiration.
State V	The subject is in the bed, stationary, but with irregular respiration (e.g., sleep apnea events).

Table 1. Five States of Subject during Monitoring Period

Fig. 2.

3.2 Overnight Respiration Profiling

3.2.1 Extracting Respiration Signal.

To remove clutter in \(\boldsymbol {R_m}\), which is caused by all static objects in the environment, we perform background subtraction [54] on \(\boldsymbol {R_m}\) and the result is denoted as \(\boldsymbol {R_m^{^{\prime }}}\).

Next, given each \(\boldsymbol {R_m^{^{\prime }}}\), we sum up the amplitude of \(\boldsymbol {R_m^{^{\prime }}}\) along the slow-time dimension, then find the fast-time index where the sum of amplitude is the largest, denoted as \(d_m\). The index \(d_m\) indicates the position of the subject.

Then, we take \(\boldsymbol {r_m} = \boldsymbol {R_m}(:, d_m)\) as the signal reflected off subject, and apply the Savitzky-Golay polynomial least squares filter [27] to remove the high-frequency noise from the signal \(\boldsymbol {r_m}\) while preserving the rapid changes and remaining the positions of the peaks and valleys.

Last, we perform a detrend operation on the denoised signal to remove the polynomial trend of the time series. The result, denoted as \(\boldsymbol {r_m^{^{\prime }}}\), contains the respiration signal from the subject within the mth sliding window.

3.2.2 Locating Position of Bed.

Given \(\boldsymbol{r_m^{^{\prime }}}, m=1,2,\ldots\), if the subject is detected to be in State IV during the period corresponding to \(\boldsymbol{r_m^{^{\prime }}}\), then the position of subject can be a candidate of the position of the bed. After processing all \(\boldsymbol{r_m^{^{\prime }}}\), we obtain a sequence of positions where the subject is in State IV and then obtain the median l of the positions. Thus the position of bed is set to [\(l-\frac{w_b}{2}\), \(l+\frac{w_b}{2}\)], where \(w_b\) is the preset width of bed.

For detecting State IV of the subject from the given \(\boldsymbol{r_m^{^{\prime }}}\), we calculate the phase difference \(\Delta \Phi\) between the maximum and minimum phase values within \(\boldsymbol{r_m^{^{\prime }}}\), and then estimate the relative displacement \(\Delta d\) of the human body within the mth window by \(\Delta d=-\frac{\lambda }{4\pi }\Delta \Phi\), where \(\lambda\) is the wavelength of the UWB signals. We name \(\Delta d\) as the window depth. If the window depth does not exceed a pre-defined threshold \(\alpha _1\) (i.e., the maximum chest displacement caused by respiration), then the subject is not thought to perform obvious body motions within the mth window. Then, we further apply the autocorrelation function [8] to phase and amplitude sequences of \(\boldsymbol{r_m^{^{\prime }}}\), respectively. From the autocorrelation result, the maximum peak value and the lag of the peak can be obtained, where the lag (denoted as \(N_{lag}\)) of the maximum peak corresponds to cycle T of the signal (i.e., \(T = N_{lag}/f_s\)). Considering that the normal respiration cycle lasts 2–10 seconds, we search the maximum peak value while the lag is in \([2 \times f_s, 10 \times f_s]\), and name it as the respiration autocorrelation value hereafter. If the respiration autocorrelation value of either amplitude or phase sequence exceeds a certain threshold \(\beta\), then the subject is regarded as in State IV.

3.2.3 Identifying Subject State.

After obtaining the bed position, we re-analyze \(\boldsymbol {R_m^{^{\prime }}}\). In detail, we sum up the amplitude of \(\boldsymbol {R_m^{^{\prime }}}\) along the slow-time dimension to obtain a vector that represents the total amplitude intensity at different distances from the device within the mth window, and the vector can be visualized as a polyline. The subject state is classified as State I if (i) no peak of the polyline is detected in the index interval \([ind_1, ind_2]\), where \([ind_1, ind_2]\) corresponds to the position of the bed (i.e., [\(l-\frac{w_b}{2}\), \(l+\frac{w_b}{2}\)]), or (ii) all peaks detected in \([ind_1, ind_2]\) are significantly smaller than the first value of the polyline (the first value denotes the closest position to the device). Condition (ii) comes from the fact that the signal on the direct path becomes the strongest one when the subject is far from the device. Figure 3 shows an example of (ii).

Fig. 3.

If the value of the maximum peak detected in \([ind_1, ind_2]\) is smaller than the peak value detected outside the bed, which shows that although the subject is in the bed, then there exists someone else making some motions outside the bed, then the subject is in State II. Figure 4 shows an example of subject 1 in State II.

Fig. 4.

If the value of the maximum peak detected in \([ind_1, ind_2]\) is the maximum value of the polyline, then we consider the subject to be in the bed and not significantly disturbed by others outside the bed. Then, we take the index of the maximum peak detected in \([ind_1, ind_2]\) as \(d_m\) (where the subject is), and obtain \(\boldsymbol{r_m^{^{\prime }}}\) from \(\boldsymbol{R_m}\) by the method in Section 3.2.1. Given \(\boldsymbol{r_m^{^{\prime }}}\), we estimate the window depth \(\Delta d\) (i.e., the relative displacement of the human body). If the window depth exceeds a pre-defined threshold \(\alpha _1\) (i.e., the maximum chest displacement caused by respiration), then an obvious body motion is thought to occur within the mth window, i.e., the subject is in State III-a.

If the subject is not in State III-a, then we then detect whether the subject is in State IV by the method in Section 3.2.2.

If excluding State IV, then we further perform the method of detecting State III-a (but replacing \(\alpha _1\) with \(\alpha _2\), \(\alpha _2\lt \alpha _1\)) to detect State III-b. Note that we detect State III-b after excluding State IV, so as to separate some waveforms of State III-b from waveforms of State IV with similar window depth. Last, after excluding States III and IV, the subject can only be in State V.

3.2.4 Estimating Respiration Rate.

When the subject is in State IV, we apply the autocorrelation function on phase and amplitude sequences of \(\boldsymbol{r_m^{^{\prime }}}\). We select the one with higher respiration autocorrelation value from phase and amplitude sequences, obtain the corresponding \(N_{lag}\), and estimate the respiration rate of the subject by \(\frac{60}{N_{lag}/f_s}\).

3.2.5 Detecting Sleep Onset and Exit.

Sleep onset refers to the first transition from awake to sleep after going to bed, and sleep exit refers to the last transition from sleep to awake before getting out of bed. Let \(T_o\) denote the sleep onset time and \(T_e\) denote the sleep exit time, then the sleep duration is \(T_e - T_o\).

We employ a long-short window based algorithm shown in Algorithm 1, and detect the sleep onset and exit of the subject by the following two steps.

As a first step, we segment the entire monitoring duration into W non-overlapping 30-minute windows, and then perform Algorithm 1 with a long window of 30 minutes and a short window of 30 seconds (i.e., the number of short windows in a long window \(N_s = 60\)) to obtain the coarse-grained \(T_o\) and \(T_e\).

As a second step, we select the signals from 30 minutes before \(T_o\) to 30 minutes after \(T_o\) and segment this duration into non-overlapping 10-minute windows. Then we adopt a long window of 10 minutes and a short window of 30 seconds (i.e., \(N_s = 20\)) to perform Algorithm 1 again, thus updating \(T_o\). Also, \(T_e\) is updated by performing Algorithm 1 with the signals from 30 minutes before \(T_e\) to 30 minutes after \(T_e\) as input.

After obtaining the refined \(T_o\) and \(T_e\) in the second step, we zoom in on this period of \([T_o, T_e]\) to further analyze the respiration events of the subject.

3.3 Respiration Event Detection

3.3.1 Identifying Sleep States.

Same as extracting \(\boldsymbol{r_m^{^{\prime }}}\) from \(\boldsymbol{R_m}\) in Section 3.2.1, we extract the respiration signal \(\boldsymbol{s_m^{^{\prime }}} = (s_{m,1}^{^{\prime }}, s_{m,2}^{^{\prime }}, \ldots , s_{m,n}^{^{\prime }})\) from \(\boldsymbol{S_m}\), where \(n = f_s \times \theta\) is the number of signal samples in \(\boldsymbol{s_m^{^{\prime }}}\). Using the respiration signal \(\boldsymbol{s_m^{^{\prime }}}\), we construct the amplitude sequence \(\boldsymbol{s_m^a}\) and the phase sequence \(\boldsymbol{s_m^p}\). Then \(\boldsymbol{s_m^a}\) and \(\boldsymbol{s_m^p}\) are normalized by performing a Z-score operation, and the result is denoted as \(\boldsymbol{s_m^{na}}\) and \(\boldsymbol{s_m^{np}}\).

To classify elements in the sequence into appropriate states (i.e., normal respiration, sleep apnea, or hypopnea), we design a seq2seq model. Figure 5 gives the architecture of the model, which consists of a CNN-based encoder, a multi-head self-attention module, a contrastive learning module, and a predictor.

Fig. 5.

The CNN-based encoder takes \({\boldsymbol{s_{m}^{na}}}\) and \({\boldsymbol{s_{m}^{np}}}\) as input and generates the embeddings of sequences, which are packed together into a matrix \(\boldsymbol{F_m}\). Then positional embeddings are added to \(\boldsymbol{F_m}\), injecting information about the relative or absolute positions of the elements in the sequence. The aggregated embeddings are fed into a multi-head self-attention module [44], extracting the implicit patterns inherent in sequences of the respiration signals, and the output is \(\boldsymbol{H_m}\). Meanwhile, a contrastive learning module is adopted to optimize the representation \(\boldsymbol{H_m}\). Last, \(\boldsymbol{H_m}\) is sent into a predictor for classifying three sleep states. The predictor generates \(\boldsymbol{\hat{P_m}} = \lbrace \boldsymbol{\hat{p_{m,1}}}, \boldsymbol{\hat{p_{m,2}}}, \ldots , \boldsymbol{\hat{p_{m,\theta }}}\rbrace ^{T}\) for \(\boldsymbol{P_m} = \lbrace \boldsymbol{p_{m,1}}, \boldsymbol{p_{m,2}}, \ldots , \boldsymbol{p_{m,\theta }}\rbrace ^T\), where \(\boldsymbol{\hat{p_{m, i}}} \in {\mathbb{R}}^{3 \times 1}\) denotes the predicted sleep state for the ith second signal in \(\boldsymbol{s_m^{^{\prime }}}\), \(\boldsymbol{p_{m, i}}\) denotes the ground-truth sleep state, and \(\theta\) is the size of the sliding window and set to 30 (seconds) in \(\boldsymbol{s_m^{^{\prime }}}\).

More specifically, we have

\[\begin{gather} \boldsymbol{S_m^{ap}} = \boldsymbol{s_m^{na}} \oplus \boldsymbol{s_m^{np}}, \end{gather}\]

(1)

\[\begin{gather} \boldsymbol{F_m} = \mathcal {F}(\boldsymbol{S_m^{ap}}), \end{gather}\]

(2)

where \(\oplus\) denotes the concatenate operation, and \(\mathcal {F}\) represents the CNN-based encoder. The output of \(\mathcal {F}\) is \(\boldsymbol{F_m} \in \mathbb {R}^{\tilde{n} \times d_{model}}\), where \(\tilde{n}\) is the sequence length and \(d_{model}\) is the dimension of sequence embeddings.

The detailed structure of \(\mathcal {F}\) is shown in Figure 6. The encoder adopts seven-layer CNNs. In layer i, we have

\[\begin{gather} \boldsymbol{Y_i} = ReLU(\boldsymbol{W_i} \otimes \boldsymbol{Z_{i-1}} + \boldsymbol{B_i}), \end{gather}\]

(3)

\[\begin{gather} \boldsymbol{Z_i}={\left\lbrace \begin{array}{l@{\quad}l} MaxPooling(\boldsymbol{Y_i}), & i=1,2,4,6 \\ BatchNorm(\boldsymbol{Y_i}), & i=3,5,7\end{array}\right.}, \end{gather}\]

(4)

where \(\otimes\) represents the convolution operation, \(\boldsymbol{Z_0} = \boldsymbol{S_m^{ap}}\) and \(\boldsymbol{F_m} = \boldsymbol{Z_7}\), \(\boldsymbol{W_i}\) and \(\boldsymbol{B_i}\) are trainable parameters of the 1D convolution operation. In particular, we treat \(\boldsymbol{s_m^{na}}\) and \(\boldsymbol{s_m^{np}}\) as two independent channels and concatenate them together before sending them into the encoder, instead of sending \(\boldsymbol{s_m^{na}}\) and \(\boldsymbol{s_m^{np}}\) into the encoder separately and concatenating their embeddings afterwards. This is because the amplitude and phase of the signal are complementary to each other, carrying their own distinct information of the real respiration waveform [25], and concatenating them together first allows the encoder to capture both information at the same time. The 1D convolution layer is adopted, because it performs very well in time-series prediction, and has the same mathematical representation as an finite impulse response filter. The 1D max pooling layer is used to reduce the computational cost of the network and provide basic translation invariance to the internal representation. Considering the characteristics of the respiration signal and the length of each \(\boldsymbol{p_{m, i}}\), we carefully design the kernel size of the 1D convolution and the 1D max pooling to ensure that \(\tilde{n} = \theta\). Each sequence embedding (i.e., each row) of \(\boldsymbol{F_m}\), which corresponds to a \(\boldsymbol{p_{m, i}}\), will be sent into the multi-head self-attention module to learn both short- and long-distance dependencies in sequences. The batch normalization layer normalizes the data in each mini-batch, which can speed up the training process. ReLU is used as the activation function as it is computationally efficient and avoids problems like vanishing or exploding gradients.

Fig. 6.

Considering that the self-attention mechanism is unaware of the order of the sequence, and discovering transitions between normal respiration and abnormal respiration is critical to respiration event detection, we inject a positional embedding \(\boldsymbol{PE} \in \mathbb {R}^{\tilde{n} \times d_{model}}\) into \(\boldsymbol{F_m}\) as shown in Equation (5). Particularly, we adopt a learnable position embedding instead of a fixed position embedding, as the fixed embedding leads to worse performance [23],

\begin{equation} \boldsymbol{F_m^{pe}} = \boldsymbol{F_m} + \boldsymbol{PE} = \left[\begin{array}{c} \boldsymbol{f_{m, 1}} + \boldsymbol{pe_1} \\ \boldsymbol{f_{m, 2}} + \boldsymbol{pe_2} \\ \cdots \\ \boldsymbol{f_{m, \tilde{n}}} + \boldsymbol{pe_{\tilde{n}}} \end{array}\right]. \end{equation}

(5)

The multi-head self-attention module contains b blocks. In block i, we adopt the following operations:

\[\begin{gather} \boldsymbol{Y_i^j} = softmax\left(\frac{\boldsymbol{Z_{i-1}} \boldsymbol{W_i^{Qj}}(\boldsymbol{Z_{i-1}} \boldsymbol{W_i^{Kj}})^T}{\sqrt {d_{model}/hd}}\right)\boldsymbol{Z_{i-1}} \boldsymbol{W_i^{Vj}}, \end{gather}\]

(6)

\[\begin{gather} \boldsymbol{Y_i} = \boldsymbol{Y_i^1} \oplus \boldsymbol{Y_i^2} \oplus \cdots \oplus \boldsymbol{Y_i^{hd}}, \end{gather}\]

(7)

\[\begin{gather} \boldsymbol{U_i} = LayerNorm(dropout(MLP(\boldsymbol{Y_i})) + \boldsymbol{Z_{i-1}}), \end{gather}\]

(8)

\[\begin{gather} \boldsymbol{V_i} = dropout(ReLU(MLP(\boldsymbol{U_i}))), \end{gather}\]

(9)

\[\begin{gather} \boldsymbol{Z_i} = LayerNorm(dropout(MLP(\boldsymbol{V_i})) + \boldsymbol{U_i}), \end{gather}\]

(10)

where \(\boldsymbol{Z_0} = \boldsymbol{F_m^{pe}}\), hd is the number of attention heads, \(\boldsymbol{W_i^{Qj}}, \boldsymbol{W_i^{Kj}}\) and \(\boldsymbol{W_i^{Vj}}\) are trainable parameters and \(j \in [1, hd]\). As shown above, we first apply the multi-head attention technique to obtain the attention-enhanced data \(\boldsymbol{Y_i}\), and then use MLP, dropout, skip connection, and layer normalization to obtain \(\boldsymbol{H_m}\) (i.e., \(\boldsymbol{H_m} = \boldsymbol{Z_b}\)).

To enhance the generalization of the model and decrease the interference from noise in the signals, we design a contrastive learning module, which consists of a CNN-based encoder and a multi-head self-attention module (b blocks) as referred to above. This module also takes \(\boldsymbol{s_m^{na}}\) and \(\boldsymbol{s_m^{np}}\) as input and generates the representation \(\boldsymbol{\tilde{{H_m}}}\), with only dropout in the self-attention module used as noise [15]. In other words, we send the same signals to the CNN-based encoder and self-attention module twice: By applying the dropout in the attention module twice, we can obtain two different representations, \(\boldsymbol{H_m}\) and \(\boldsymbol{\tilde{{H_m}}}\), as “a positive pair.” A robust representation can be learned in the latent space by minimizing the gap between \(\boldsymbol{H_m}\) and \(\boldsymbol{\tilde{{H_m}}}\). The contrastive loss function is shown in Equation (11):

\begin{equation} L_{contrastive}(\boldsymbol{h_m}, \boldsymbol{\tilde{h_m}}) = -log\frac{exp(sim(\boldsymbol{h_m}, \boldsymbol{\tilde{h_m}})/\tau)}{exp(sim(\boldsymbol{h_m}, \boldsymbol{\tilde{h_m}})/\tau) + \sum _{i \ne m}^{2N}exp(sim(\boldsymbol{h_m}, \boldsymbol{h_i})/\tau)}, \end{equation}

(11)

where \(\boldsymbol{h_m} = AvgPooling(\boldsymbol{H_m})\), \(\boldsymbol{\tilde{h_m}} = AvgPooling(\boldsymbol{\tilde{H_m}})\), \(\tau\) is a temperature hyper-parameter, N denotes the batch size, and the cosine similarity is adopted to calculate the similarity of two representations, i.e., \(sim(\boldsymbol{h_a}, \boldsymbol{h_b}) = \frac{\boldsymbol{h_a}^T \boldsymbol{h_b}}{||\boldsymbol{h_a}||\cdot ||\boldsymbol{h_b}||}\). as referred to above, we create “positive pairs” by applying the dropout to the same \(\theta\)-second signals twice in a batch. Then we take other signals in the same batch as “negatives.” Therefore, the training objectives of Equation (11) are to minimize the gap between the representations of “the positive pair” (\(\boldsymbol{H_m}\) and \(\boldsymbol{\tilde{H_m}}\)), and maximize the gap between the representations of “the negative pairs” (\(\boldsymbol{H_m}\) and \(\boldsymbol{H_i}\), where \(i\ne m\)).

The predictor contains the following operations:

\begin{equation} \boldsymbol{\hat{P_m}} = MLP(dropout(ReLU(MLP(\boldsymbol{H_m})))), \end{equation}

(12)

where \(\boldsymbol{\hat{P_m}} = \lbrace \boldsymbol{\hat{p_{m,1}}}, \boldsymbol{\hat{p_{m,2}}}, \ldots , \boldsymbol{\hat{p_{m,\theta }}}\rbrace ^T \in \mathbb {R}^{\theta \times 3}\). The predictor classifies elements in the sequence into one of three sleep states (normal respiration, sleep apnea, or hypopnea). It is worth noting that the classification result is second-grained (i.e., 1-second signal has a sleep state).

We adopt the following cross-entropy loss as the loss function for classification:

\begin{equation} L_{cross\_entropy}(\boldsymbol{P_m}, \hat{\boldsymbol{P_m}}) = -\frac{1}{\theta }\sum _{i=1}^{\theta }\sum _{k=1}^{3}p_{m,i}^k log\frac{exp(\hat{p_{m,i}^k})}{\sum _{j=1}^{3}exp(\hat{p_{m, i}^{j}})}. \end{equation}

(13)

Last, the total loss function is shown in Equation (14), where \(\lambda _c\) is a hyper-parameter,

\begin{equation} L = L_{cross\_entropy} + \lambda _c L_{contrastive}. \end{equation}

(14)

3.3.2 Aggregating Respiration Events.

Given \(\boldsymbol{s_m^{^{\prime }}}\), we can obtain a prediction sequence by above model prediction, where each element is the predicted sleep state (normal respiration, sleep apnea, or hypopnea) of each second. as referred to above, we adopt a \(\theta\)-second sliding window with a \(\delta\)-second sliding step to segment the overnight data. If \(\theta \gt \delta\), then \(\boldsymbol{s_m^{^{\prime }}}\) and \(\boldsymbol{s_{m+i}^{^{\prime }}}\) will overlay for \(\theta - i \times \delta\) seconds. Since all these data are fed into the model for prediction, data of a certain second will have multiple predictions. In response, we adopt a multi-window voting mechanism to decide which sleep state the subject is in each second, i.e., we adopt the most voted state as the final state for each second. After this, we can obtain a sequence whose element is the predicted state for 1-second signal and whose length is the sleep duration (in seconds) of the subject.

In general, a respiration event is detected when there are ten or more consecutive abnormal sleep state predictions (sleep apnea or hypopnea) in the overnight prediction sequence. However, sporadic predictions of normal respiration could be made among continuous abnormal predictions, thus making the duration of some continuous abnormal predictions less than 10 seconds and being unable to be aggregated as a complete event, which results in missing the detection of the respiration events. To solve this problem, we employ a sliding window of 10 seconds to slide over the overnight state prediction sequence with a sliding step of 1 second. We rectify the predictions in the sliding window to the corresponding abnormal sleep state when all of the following three conditions are satisfied: (i) the first prediction in the window or its previous one (which is outside the window) is an abnormal sleep state, (ii) the last prediction in the window or its next one (which is outside the window) is an abnormal sleep state, and (iii) more than \(70\%\) predictions in the window are the abnormal sleep state.

After rectifying, we detect respiration events over the rectified prediction sequence and calculate the total number of different respiration events.

3.3.3 Estimating AHI and Diagnosing SAHS.

To calculate AHI, we first identify the period when the subject is awake with the help of the body motion distribution during overnight sleep, and estimate the total sleep time from it.

We propose that the subject is awake when one of the three conditions is satisfied: (i) given \(\boldsymbol{r_m^{^{\prime }}}\), the window depth of it is greater than a pre-defined threshold \(\alpha _3\) (\(\alpha _3 \gt \alpha _1\)), which means that the body motions are intense enough to indicate the subject is awake; (ii) the duration between two State IIIs is within 1.5 minutes, i.e., the subject has been performing continuous motions for a certain period of time; and (iii) the subject is not in the bed (i.e., State I). We sum up these periods to obtain the total waking time after the subject falls asleep. Then, we subtract the total waking time from the sleep duration (i.e., \(T_e - T_o\)) to obtain the total sleep time.

Last, AHI is calculated by Equation (15). Based on AHI, we can diagnose the SAHS severity (normal, mild, moderate, and severe) of the subject,

\begin{equation} AHI = \frac{\#CA + \#OA + \#MA + \#HY}{total\ sleep\ time}, \end{equation}

(15)

where \(\#CA\), \(\#OA\), \(\#MA\), and \(\#HY\) denote the number of central apnea, obstructive apnea, mixed apnea, and hypopnea events during overnight sleep, respectively.

4 Evaluation

4.1 Experimental Setup

We adopt a commercial UWB module XETHRU model X4M05 as the front-end of Respnea. The module has a center frequency of 7.3 GHz and a bandwidth of 1.4 GHz. We set the frame per second to 17 in consideration of the range of respiration rate and signal-to-noise ratio. The UWB module is connected to a Raspberry Pi 4B, and both of them are packaged as a compact device, as shown in Figure 7.

Fig. 7.

We implement the overnight respiration profiling component in Matlab. \(\alpha _1\), \(\alpha _2\), and \(\alpha _3\) are set to 1, 0.15, and 3, respectively. \(\beta\) is set to 0.4. Also, we implement the respiration event detection component in Python. The layer number of CNN-based encoder \(L_c\) is set to 7, and the layer number of self-attention \(L_s\) is set to 2. The dimension of sequence embeddings \(d_{model}\) is set to 64. The number of heads hd is set to 4. The number of blocks b is set to 2. The temperature parameter \(\tau\) is set to 1. \(\lambda _c\) in the loss function is set to 0.1. We use Adam as the optimizer with a learning rate of 0.001. The batch size N is set to 128.

We deploy Respnea in hospital and home scenarios to produce two datasets as follows, and the data collection procedures have been approved by the institutional review board of Institute of Software, Chinese Academy of Sciences and the medical ethics committee of the Second Affiliated Hospital of Xi’an Jiaotong University, respectively.

In-hospital Dataset. Our in-hospital experiments are conducted in seven hospital wards, and the experimental scenario in the hospital ward is shown in Figure 10. Our device is placed on a nightstand and orientated towards the subject. We collect the data of 18 nights (151 h in total) from 18 subjects aged between 5 and 58, including 5 females and 13 males. PSG sensors are used to monitor the overnight sleep of the subjects. By analyzing the PSG data, the medical technicians can diagnose respiration events, providing the ground truth, including the total number of respiration events, the start time and end time of each event, the total sleep time, and the severity of SAHS. Our in-hospital dataset includes 9 healthy subjects, 1 mild-SAHS subject, 3 moderate-SAHS subjects, and 5 severe-SAHS subjects.

Fig. 8.

Fig. 9.

Fig. 10.

In-home Dataset. Our in-home experiments are conducted in 7 home rooms, and the experimental scenario in the home room is shown in Figure 9. Our device is placed beside the bed and orientated towards the sleeper. We recruit 7 volunteers (1 female and 6 males) aged between 22 and 34, and collect the data of 17 nights (108 h in total). As for ground truth, we adopt a three-lead sleep monitor, i.e., Heal Force PC-3000 as shown in Figure 8, to record respiration rates. Meanwhile, we use an infrared camera (i.e., EZVIZ C6CN camera) to record the overnight body motions of the sleeper.

4.2 Performance on Overnight Respiration Profiling

In this subsection, we conduct experiments on the in-hospital dataset and in-home dataset to evaluate the performance of Respnea on overnight respiration profiling.

4.2.1 Respiration Rate Estimation Evaluation.

The estimation error of respiration rate is defined as the absolute value of the difference between the estimated respiration rate \(R_E\) and the ground truth \(R_G\), i.e., \(|R_E-R_G|\). We select two non-intrusive methods on respiration monitoring (Raheel et al. [34] and Liu et al. [26]) as the baselines. Raheel et al. adopt a UWB device to monitor the respiration rates of stationary subjects; Liu et al. exploit Wi-Fi signals to monitor the respiration rates of the subject during sleep. We implement these two methods and apply them to our datasets. The cumulative distribution functions (CDFs) of the respiration rate estimation errors of three methods are shown in Figure 11 (in-hospital dataset) and Figure 12 (in-home dataset). Meanwhile, Table 2 shows the median errors and 90-quantile errors of three methods on both datasets. The experimental results show that Respnea achieves the lowest median error and 90-quantile error on both datasets. In particular, we find that some respiration rate errors of the baselines are large. Presumably, this is caused by two factors. (i) The baselines use either amplitude or phase sequences to estimate the respiration rates. However, if merely using amplitude or phase information, then human respiration cannot be sensed effectively in blind-spot locations [57]. Respnea solves this problem by combining both amplitude and phase sequences to estimate respiration rates. (ii) The baselines fail to capture some body motions and perform respiration rate estimation on these waveforms, causing large estimation errors. Figure 13 shows the respiration rates of a patient within a certain interval of a night using three methods. It can be seen that Respnea fits the ground truth the best. And we can also obtain a similar result from the in-home dataset, as shown in Figure 14.

Table 2.

Dataset	Method	Median error (bpm)	90-quantile error (bpm)	Temporal coverage (%)
In-hospital dataset	Respnea	0.22	0.57	67.42
	Raheel et al.	1.80	3.11	63.05
	Liu et al.	0.27	2.07	65.70
In-home dataset	Respnea	0.27	0.74	82.99
	Raheel et al.	1.48	2.68	74.56
	Liu et al.	0.65	2.30	76.33

Table 2. The Performance on Respiration Rate Estimation

The best result in each column is in bold.

Fig. 11.

Fig. 12.

Fig. 13.

Fig. 14.

4.2.2 Respiration Rate Coverage Evaluation.

The temporal coverage of respiration rate is defined as the ratio of the duration where the subject is in State IV over the sleep duration. Compared to methods that only use amplitude signals or phase signals, our approach improves the temporal coverage of the respiration rate by combining amplitude and phase signals. Table 2 shows the average temporal coverage of three methods on both datasets. It can be seen that Respnea outperforms the other two methods in temporal coverage of respiration rate while maintaining low errors in respiration rate estimation. Meanwhile, we also find that the temporal coverage of the in-hospital dataset is apparently lower than that of the in-home dataset. This is because subjects of the in-home dataset are healthy and have a longer time in state IV than subjects of the in-hospital dataset. In addition, the estimation error of our approach on the in-hospital dataset is smaller than the estimation error on the in-home dataset. This fact needs to be analyzed together with the results about temporal coverage. Specifically, on the in-hospital dataset, the estimation error is lower, but the duration for which the respiration rate estimation can be performed is shorter. On the in-home dataset, the opposite is true. This is in line with our expectations.

4.2.3 Body Motion Detection Evaluation.

We use the motion sensor data from the PSG devices as the ground truth of body motions (State III) for the in-hospital dataset, and the videos recorded by the infrared camera as the ground truth of body motions for the in-home dataset. Table 3 summarizes the body motion detection results on both datasets by Respnea. The experimental results show that Respnea attains high F1-scores on detecting body motions no matter in the in-hospital scenario (0.9175) or in-home scenario (0.9315), indicating that Respnea can accurately detect body motions of different subjects in different scenarios.

Table 3.

Dataset	Precision	Recall	F1-score
In-hospital dataset	0.9670	0.8727	0.9175
In-home dataset	0.9813	0.8864	0.9315

Table 3. The Performance on Body Motion Detection

In short, Respnea attains low errors and high temporal coverage in respiration rate estimation, and achieves excellent performance on body motion detection in both in-hospital and in-home datasets. The experimental results illustrate the effectiveness and generalization of Respnea on overnight respiration profiling.

4.3 Performance on Respiration Event Detection

In this subsection, we use in-hospital data to evaluate the performance of Respnea on respiration event detection.

We divide the in-hospital dataset into 18 subsets, and each subset contains one subject’s data. Each time, one subset is chosen for testing, another subset is chosen from the left 17 subsets for validation and the remaining are for training. This procedure is repeated 18 times, thus obtaining SAHS severity for every subject and the final performance of the model is the average of the 18 results.

We label the signal data per second and obtain 49,724 labels of sleep apnea events, 18,923 labels of hypopnea events, and 480,173 labels of normal respiration. To alleviate the class imbalance problem, we adopt two different strategies: (i) for the training set and validation set, we first locate all respiration events based on ground truth. For each event, we obtain a sequence from the UWB signals, whose duration is from 10 seconds before the start of the event to 10 seconds after the end of the event. This sequence contains not only a complete respiration event, but also the transition from normal respiration to a respiration event and the reversed transition. We adopt a 30-second sliding window with a 1-second sliding step to segment the sequence into multiple respiration event samples. Meanwhile, we use a 30-second sliding window with a 10-second sliding step to segment the sequence without any respiration events, obtaining the normal respiration samples. In this way, the training set contains 820,775 labels of sleep apnea events, 292,743 labels of hypopnea events, and 1,532,482 labels of normal respiration in the end; (ii) for the testing set, we produce samples by sliding a 30-second window over the sleep signals with a step of 10 seconds.

Subsequently, we evaluate the performance of Respnea by estimating the total number of respiration events, identifying the total sleep time, and diagnosing the SAHS severity of each subject.

4.3.1 Estimating the Number of Respiration Events.

Figure 15 shows the scatter plot of 18 subjects comparing the total number of respiration events estimated by Respnea to that of ground truth. The intraclass correlation coefficient [6] between Respnea and the ground truth is 0.9620, indicating that the total number of respiration events estimated by Respnea and the ground truth are highly correlated.

Fig. 15.

4.3.2 Estimating the Total Sleep Time.

Figure 16 shows the total sleep time estimated by Respnea and the ground truth (estimated by EEG sensors). We can see that the median error of Respnea is 21 minutes, and the mean error is 49 minutes. And the reasons why some subjects obtain relatively large errors are as follows: (i) some subjects wake up several times throughout the night. During each waking period, they remain stationary and try to fall asleep, resulting in overestimation of the total sleep time in Respnea; (ii) some severe SAHS subjects perform frequent motions when the respiration events occur continuously and stay asleep, resulting in underestimation of the total sleep time in Respnea. However, we can find that these errors are acceptable from the viewpoint of the classification accuracy of SAHS severity.

Fig. 16.

4.3.3 Identifying SAHS Severity.

We estimate AHI using Equation (15), by which the subject can be diagnosed as normal (AHI \(\lt\) 5), mild-SAHS (5 \(\le\) AHI \(\lt\) 15), moderate-SAHS (15 \(\le\) AHI \(\lt\) 30), and severe-SAHS (AHI \(\ge\) 30). Table 4 shows the confusion matrix of the SAHS severity classification by Respnea. Among the 18 subjects, we correctly predict the SAHS severity of 17 subjects, reaching an accuracy of \(94.44\%\). Meanwhile, we select three non-intrusive methods proposed by Nandakumar et al. [31], Romero et al. [38], and Kang et al. [22] as the baselines. Nandakumar et al. use hand-crafted rules for detecting respiration events, Romero et al. adopt a multi-layer CNN model and Kang et al. adopt a CNN-LSTM model for apnea detection. Experimental results of the three methods on the SAHS severity classification are shown in Table 5. It can be seen that Respnea outperforms the other three methods in all metrics.

Table 4.

Ground truth/Prediction	Normal	Mild	Moderate	Severe
Normal	0.8889	0.1111	0	0
Mild	0	1	0	0
Moderate	0	0	1	0
Severe	0	0	0	1

Table 4. Confusion Matrix of SAHS Severity Classification

Table 5.

Method	Accuracy	Precision	Recall	F1-score
Nandakumar et al.	0.6667	0.7702	0.6667	0.6754
Romero et al.	0.7222	0.7222	0.7222	0.7222
Kang et al.	0.7778	0.6869	0.7778	0.7278
Respnea	0.9444	0.9722	0.9444	0.9521

Table 5. The Performance on Diagnosing SAHS Severity

The best result in each column is in bold.

4.4 Ablation Study

In this subsection, we evaluate the effectiveness of our CNN-based encoder module, multi-head self-attention module, contrastive learning module, and the amplitude-phase combining technique with the ablation study.

We derive five variants from Respnea as follows:

—

Respnea-CNN: Respnea without the CNN-based encoder.

—

Respnea-SelAtt: Respnea without the multi-head self-attention module.

—

Respnea-Contra: Respnea without the contrastive learning module.

—

Respnea-Amp: Respnea without the amplitude signals, only the phase signals are sent to the model as input.

—

Respnea-Pha: Respnea without the phase signals, only the amplitude signals are sent to the model as input.

Table 6 summarizes the comparison results of Respnea and its five variants on diagnosing the SAHS severity. We can see that both Respnea-CNN and Respnea-SelfAtt suffer severe declines in performance, demonstrating that both the CNN-based encoder and multi-head self-attention module are effective in improving the performance. The performance of Respnea-Contra is lower than that of Respnea, indicating that the contrastive learning module can help extract a robust representation. Further, the results of Respnea-Amp and Respnea-Pha show that both amplitude and phase information of the UWB signals provide distinct information, and combining them can help improve the model performance. Meanwhile, we find that both Respnea-Amp and Respnea-Pha have better performance than Respnea-CNN and Respnea-SelfAtt, indicating the effectiveness of the model architecture in Respnea.

Table 6.

Method	Accuracy	Precision	Recall	F1-score
Respnea-CNN	0.6111	0.9514	0.6111	0.6997
Respnea-SelfAtt	0.6111	0.7929	0.6111	0.6244
Respnea-Contra	0.8889	0.9630	0.8889	0.9095
Respnea-Amp	0.8333	0.9028	0.8333	0.8581
Respnea-Pha	0.7778	0.7361	0.7778	0.7444
Respnea	0.9444	0.9722	0.9444	0.9521

Table 6. The Performance of Respnea and Variants

The best result in each column is in bold.

4.5 Impact of Different Hyper-parameters

4.5.1 Overnight Respiration Profiling.

We first analyze the sensitivity of the hyper-parameters in the overnight respiration profiling component, including the threshold \(\alpha _1\) to determine whether the subject performs obvious body motions, the threshold \(\alpha _2\) to determine whether the subject performs moderate body motions, the threshold \(\alpha _3\) to determine whether the subject is awake, and the threshold \(\beta\) to determine whether the subject is stationary and breathing regularly. These parameters will impact the performance of respiration rate estimation by identifying the subject states, and the SAHS classification by affecting the total sleep time. The default values of \(\alpha _1\), \(\alpha _2\), \(\alpha _3\), and \(\beta\) are 1, 0.15, 3, and 0.4. We vary the value of one parameter while keeping the other parameters fixed, with the restriction of \(\alpha _2 \lt \alpha _1 \lt \alpha _3\). The experimental results are shown in Table 7, from which we have the following findings:

Table 7.

Hyper-parameter		Median error (bpm)	90-Q error (bpm)	Temporal coverage (%)	F1-score	Accuracy
		Respiration rate			Motion	SAHS
\(\alpha _1\)	0.15	0.2198	0.5416	51.16	0.7705	0.7778
	0.5	0.2236	0.5587	64.56	0.8902	0.8889
	1	0.2241	0.5702	67.42	0.9175	0.9444
	1.5	0.2262	0.5711	68.38	0.9014	0.8889
	2	0.2265	0.5719	68.89	0.8919	0.8889
\(\alpha _2\)	0.05	0.2241	0.5702	67.42	0.8632	0.8333
	0.1	0.2241	0.5702	67.42	0.9013	0.8889
	0.15	0.2241	0.5702	67.42	0.9175	0.9444
	0.5	0.2241	0.5702	67.42	0.7460	0.7778
	1	0.2241	0.5702	67.42	0.6357	0.6667
\(\alpha _3\)	1	0.2241	0.5702	67.42	0.9175	0.8889
	2	0.2241	0.5702	67.42	0.9175	0.8889
	3	0.2241	0.5702	67.42	0.9175	0.9444
	4	0.2241	0.5702	67.42	0.9175	0.9444
	5	0.2241	0.5702	67.42	0.9175	0.8889
\(\beta\)	0.2	0.2277	0.5977	79.75	0.8105	0.8333
	0.3	0.2273	0.5815	74.26	0.8780	0.8889
	0.4	0.2241	0.5702	67.42	0.9175	0.9444
	0.5	0.2236	0.5518	59.10	0.8950	0.8889
	0.6	0.2198	0.5307	48.17	0.8676	0.8889

Table 7. Performance with Different Hyper-parameters in the Overnight Respiration Profiling Component

The best result in each column is in bold.

—

Threshold of obvious body motions \(\alpha _1\). As \(\alpha _1\) increases, the subject is less likely to be identified as in the obvious body motion state. The experimental results show that our system achieves the optimal performance on body motion detection and SAHS classification with \(\alpha _1\) = 1. Also, the compromise between estimation error and temporal coverage of respiration rate is reached by setting \(\alpha _1\) to 1. We can observe that the performance of body motion detection and SAHS classification tends to be stable when \(\alpha _1\) is relatively large, and decreases sharply when \(\alpha _1\) is relatively small. This is because the large \(\alpha _1\) tends to recognize a subject as being in a state of moderate motion rather than obvious motion, and the small \(\alpha _1\) leads to identifying a subject as being in a motion state rather than a stationary state. The former does not affect the performance of the motion detection, but the latter does.

—

Threshold of moderate body motions \(\alpha _2\). The results show that the performance of body motion detection and SAHS classification gradually increases as \(\alpha _2\) varies from 0.05 to 0.15, and decreases as \(\alpha _2\) continues to increase. The reason is that the higher threshold of \(\alpha _2\) leads to more missed detections of moderate body motions. Also, the value of \(\alpha _2\) does not impact the respiration evaluation part as this parameter is only used in detecting moderate motions and is irrelevant to the estimation of respiration rate.

—

Threshold of wake state \(\alpha _3\). This parameter is designed to decide whether the body motions are intense enough to indicate the subject is awake, and as \(\alpha _3\) increases, the subject is less likely to be identified as in the wake state. We vary the value of \(\alpha _3\) and the SAHS classification achieves the best performance when \(\alpha _3\) is set to 3 or 4. This parameter is irrelevant to the estimation of respiration rate and body motion, because we start to identify the wake state after finishing the respiration rate estimation and body motion detection in our algorithm.

—

Threshold of respiration detectable state \(\beta\). As can be seen from Table 7, the smaller \(\beta\) increases the temporal coverage of respiration rates but results in larger estimation errors. And the situation is completely the opposite for the larger \(\beta\). Therefore, we make a tradeoff and set \(\beta\) to 0.4, in which the body motion detection and SAHS classification also achieve the best performance.

4.5.2 Respiration Event Detection.

We now observe how the hyper-parameters in the respiration event detection component affect the diagnosing performance, including the layer number of CNN-based encoder \(L_c\), the layer number of self-attention \(L_s\), the dimension of sequence embeddings \(d_{model}\), the temperature parameter \(\tau\) of the contrastive learning module, and the batch size N. Table 8 shows the performance of Respnea with one of the hyper-parameters varying while keeping other hyper-parameters at their optimal settings. From Table 8, we have the following observations:

Table 8.

Hyper-parameter		Accuracy	Precision	Recall	F1-score
	1	0.7222	0.7870	0.7222	0.7312
	3	0.6667	0.7870	0.6667	0.7003
\(L_c\)	5	0.7222	0.9537	0.7222	0.7770
	7	0.9444	0.9722	0.9444	0.9521
	9	0.8333	0.8722	0.8333	0.8192
	1	0.8333	0.7917	0.8333	0.8000
	2	0.9444	0.9722	0.9444	0.9521
\(L_s\)	3	0.6667	0.7857	0.6667	0.6917
	4	0.7778	0.8958	0.7778	0.7951
	5	0.7778	0.9556	0.7778	0.8171
	8	0.6667	0.9524	0.6667	0.7321
	16	0.7778	0.9556	0.7778	0.8172
\(d_{model}\)	32	0.7778	0.7889	0.7778	0.7669
	64	0.9444	0.9722	0.9444	0.9521
	128	0.8333	0.9583	0.8333	0.8539
	0.01	0.8889	0.9630	0.8889	0.9095
	0.05	0.6667	0.7857	0.6667	0.6917
\(\tau\)	0.1	0.8333	0.9583	0.8333	0.8539
	0.5	0.8333	0.9583	0.8333	0.8539
	1	0.9444	0.9722	0.9444	0.9521
	16	0.7222	0.9537	0.7222	0.7841
	32	0.8333	0.9583	0.8333	0.8539
N	64	0.7778	0.8889	0.7778	0.8264
	128	0.9444	0.9722	0.9444	0.9521
	256	0.8333	0.9583	0.8333	0.8667

Table 8. Diagnosing Performance with Different Hyper-parameters in the Respiration Event Detection Component

The best result in each column is in bold.

—

Layer number of CNN \(L_c\). The results show that stacking CNN layers helps extract more complex features from the respiration signals, and can boost diagnosing performance of Respnea. The optimal setting for \(L_c\) is seven layers, and the performance declines when \(L_c\) is greater than 7, largely due to overfitting.

—

Layer number of self-attention \(L_s\). The results show that Respnea benefits from a smaller \(L_s\), and reaches the optimal performance when \(L_s\) is set to two layers. This phenomenon indicates that a shallow self-attention structure is enough for Respnea to learn the transition patterns of the respiration events, after a deep CNN-based encoder.

—

Dimension of sequence embeddings \(d_{model}\). As can be seen in Table 8, the performance of the model gradually increases as \(d_{model}\) varies from 8 to 64, and starts to decrease when \(d_{model}\) continues to increase. The reason for this phenomenon may be that the large dimension of embeddings leads to overfitting. Therefore, we set \(d_{model}\) to 64.

—

Temperature parameter \(\tau\). In the contrastive learning module, \(\tau\) is introduced to control the strength of penalties on hard negative samples. The result shows that the optimal setting for \(\tau\) is 1, which indicates the model does not need too many penalties on hard negative samples.

—

Batch size N. The results show that Respnea benefits the most when N is set to 128, and a smaller or larger N leads to performance decrease.

4.6 Impact of Different Factors

In this section, we evaluate the impact of different factors on the performance of respiration rate estimation, including sleeping posture, distance and direction between the subject and the device. Note that we cannot evaluate the impact of these factors on SAHS classification, because the SAHS severity is a metric of assessing the natural sleep of a subject. Each subject will be diagnosed as one of the four SAHS severities based on his/her whole-night sleep, during which the subject can move around in the bed unconsciously or consciously with different sleeping postures, which leads to different distances and directions between the subject and the device. Considering that forcing a subject to stay still or in a certain sleeping posture during sleep is impractical and violates ethical guidelines, the SAHS classification under a certain sleeping posture or a certain distance/direction to a device does not be evaluated.

4.6.1 Impact of Sleeping Posture.

To observe the impact of different sleeping postures on performance, we evaluate Respnea under four typical sleeping postures on the in-hospital dataset, including supine, left lateral, right lateral, and prone positions, as shown in Figure 17. The posture information during sleep is recorded by the PSG sensors. The results in Table 9 indicate that the errors under left and right lateral postures are smaller than those under supine and prone postures. This is because our device is placed on the nightstand beside the bed, and when the subjects change their sleeping postures from lateral positions to supine or prone positions, the body’s reflection surface changes from the chest or the back to the side of the body. The effective signal reflection surface and motion displacement are both reduced, thus resulting in the slight increase of the errors.

Table 9.

Sleeping posture	Median error (bpm)	90-quantile error (bpm)	Time proportion (%)
Supine	0.2417	0.6289	41.63
Left lateral	0.2120	0.5445	29.43
Right lateral	0.2120	0.5296	26.93
Prone	0.2496	0.5634	2.01

Table 9. Impact of Different Sleeping Postures on Respiration Rate Estimation

Fig. 17.

4.6.2 Impact of Subject-device Distance.

In this part, we investigate the impact of different subject-device distances on the performance of Respnea. During sleep, the subject unconsciously moves his/her body, which results in various distances away from the UWB device. From the experimental results of the in-hospital dataset, we select the distance between the device and the subject during sleep from 40 to 100 cm at a step size of 15 cm, as shown in Figure 18(a). The experimental results in Figure 18(b) show that Respnea achieves the median errors of 0.20, 0.23, 0.25, 0.28, and 0.31 bpm under different subject-device distances, respectively. We can observe that the estimation error increases slightly as the distance increases, which should be due to the fact that the reflection signals get weaker with distance.

Fig. 18.

4.6.3 Impact of Subject-device Direction.

Our UWB device has a typical opening angle of 65\(^{\circ }\) azimuth and elevation. To evaluate the impact of different subject-device directions on Respnea, we first change the orientation of the device towards the subject at a horizontal angle of 0\(^{\circ }\), 20\(^{\circ }\), 40\(^{\circ }\), and 60\(^{\circ }\) as shown in Figure 19(a). It can be seen from Figure 19(b) that Respnea achieves the median errors of 0.17, 0.21, 0.25, and 0.30 bpm on respiration rate estimation, respectively. Subsequently, we adjust the height of the device’s placement to ensure the vertical angle between the device and the subject varies from 0\(^{\circ }\) to 60\(^{\circ }\), as shown in Figure 20(a). Figure 20(b) shows that Respnea achieves the median errors of 0.17, 0.19, 0.20, and 0.28 bpm on respiration rate estimation, respectively. The results on both horizontal and vertical angles indicate that the error slightly increases as the angle increases, which is due to the gradually weakening signal reflection.

Fig. 19.

Fig. 20.

4.7 Generalization Capability Test

To evaluate the performance of Respnea on unseen data from non-hospital scenarios, we conduct experiments in the home scenarios, including dormitories and bedrooms as shown in Figure 21.

Fig. 21.

We collect the data of 15 nights from 6 subjects in bedrooms and 15 nights from 6 subjects in dormitories. As for the ground truth in these non-hospital scenarios, we adopt a sleeping pad (i.e., Withings Sleep [3]) for obtaining the SAHS severity. Respnea trained on the in-hospital dataset is employed to diagnose the SAHS severities of the subjects in the new datasets. The experimental results are shown in Table 10. It can be seen that Respnea can also achieve excellent performance on diagnosing SAHS severity in home scenarios including bedrooms and dormitories, which validates the generalizability of our model.

Table 10.

Scenario	Accuracy	Precision	Recall	F1-score
Bedroom	0.9333	1.0000	0.9333	0.9655
Dormitory	1.0000	1.0000	1.0000	1.0000
Overall	0.9667	1.0000	0.9667	0.9831

Table 10. Performance of Diagnosing SAHS Severity in Different Home Scenarios

4.8 Case Study

In this subsection, we further evaluate the performance of Respnea on overnight respiration profiling and respiration event detection using two specific cases.

4.8.1 Profiling Overnight Respiration.

We take the data of a certain subject from the in-hospital dataset. Figure 22 shows the subject state distribution during the monitoring period (from 19:00 on the first day to 11:00 on the second day). As can be seen in Figure 22, the actions of the subject and the doctor, and the subject states are recorded: the subject is in the bed and stays awake between 19:00 and 19:45 (Figure 22 shows that the subject is in State III or State IV during this period). The subject leaves the bed at 19:45 and returns at 20:07 (State I during this period). Then the doctor helps the subject wear PSG sensors for nearly 20 minutes (State II), after which the subject falls asleep (Respnea detects the sleep onset and exit as referred to above). The subject wakes up at about 5:30 on the second day (transition from State IV to State III) and leaves the bed once (State I). After interacting with others in the same room at about 7:00 several times (State II), the subject lies in the bed until 9:33. Then the subject leaves the room (State I). The doctor turns off our device at around 11:00 and ends the monitoring.

Fig. 22.

This case illustrates that Respnea can detect the distribution of subject states during a long monitoring period, and provide a feasible solution in unknown practical scenarios.

4.8.2 Detecting Respiration Events.

Figure 23 shows an example of detecting respiration events. The first subfigure is the ground-truth waveform (i.e., thoracic motions in PSG) for a certain period of time, in which an obstructive sleep apnea event occurs. The second subfigure is the amplitude and phase sequences of the UWB signals during the same period. And the last subfigure is the prediction per second of our model. It can be seen that (i) the UWB waveforms have a high similarity to the ground-truth waveform and (ii) our model is able to accurately predict the sleep state at each second, and rectify the unreasonable predictions using the voting-based method (see Section 3.3.2). Eventually, a sleep apnea event is detected after the model aggregates the continuous predictions of sleep states.

Fig. 23.

5 Discussion

In this section, we discuss the possible cost and overhead of deploying our system in real-life scenes (e.g., in-home scenarios) for daily use.

Hardware. All the components of the device used in our system are commercial-off-the-shelf (COTS), including a consumer-level UWB chip [2] and a Raspberry Pi [1].

Network. The current version of our system is a component-based system, not yet an end-to-end system, with data being collected first and then exported from the device for further analysis. However, for the envisioned use of our system as a commercial product in daily life, continuous data collection, transmission, and analysis will be necessary. Therefore, IoT SIM cards and cloud servers are needed for the actual system deployment.

User Interface. An app on mobile phones is expected to be developed to present the sleep reports (including overnight respiration rates, motions, and SAHS severity) for the users.

6 Conclusion

In this article, we design a non-intrusive and fine-grained sleep respiration monitoring system Respnea. By exploiting UWB signals, Respnea enables respiration rate estimation and subject state classification during sleep. Furthermore, Respnea can be used to diagnose the SAHS severity of a subject, providing convenience on early warning and follow-up for patients with SAHS. The experimental results demonstrate the feasibility of Respnea in both in-hospital and in-home scenarios.

References

[1]

2020. Raspberry Pi 4 Computer Model B 8GB Single Board Computer. Retrieved from https://www.amazon.com/Raspberry-Pi-Computer-Suitable-Workstation/dp/B0899VXM8F/

Abstract

1 Introduction

2 Related Work

3 Design of Respnea

3.1 Overview

3.2 Overnight Respiration Profiling

3.2.1 Extracting Respiration Signal.

3.2.2 Locating Position of Bed.

3.2.3 Identifying Subject State.

3.2.4 Estimating Respiration Rate.

3.2.5 Detecting Sleep Onset and Exit.

3.3 Respiration Event Detection

3.3.1 Identifying Sleep States.

3.3.2 Aggregating Respiration Events.

3.3.3 Estimating AHI and Diagnosing SAHS.

4 Evaluation

4.1 Experimental Setup

4.2 Performance on Overnight Respiration Profiling

4.2.1 Respiration Rate Estimation Evaluation.

4.2.2 Respiration Rate Coverage Evaluation.

4.2.3 Body Motion Detection Evaluation.

4.3 Performance on Respiration Event Detection

4.3.1 Estimating the Number of Respiration Events.

4.3.2 Estimating the Total Sleep Time.

4.3.3 Identifying SAHS Severity.

4.4 Ablation Study

4.5 Impact of Different Hyper-parameters

4.5.1 Overnight Respiration Profiling.

4.5.2 Respiration Event Detection.

4.6 Impact of Different Factors

4.6.1 Impact of Sleeping Posture.

4.6.2 Impact of Subject-device Distance.

4.6.3 Impact of Subject-device Direction.

4.7 Generalization Capability Test

4.8 Case Study

4.8.1 Profiling Overnight Respiration.

4.8.2 Detecting Respiration Events.

5 Discussion

6 Conclusion

References

Cited By

Index Terms

Recommendations

Hypnos: A Contactless Sleep Stage Monitoring System Using UWB Signals

UWB-enabled Sensing for Fast and Effortless Blood Pressure Monitoring

ECG-grained Cardiac Monitoring Using UWB Signals

Comments

Information

Published In

Publisher

Journal Family

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations