1 Introduction
Humans spend approximately one-third of their lives sleeping. High-quality sleep is of vital importance for the short-term proper functioning of the human body and for long-term good health [
1]. Chronic sleep deprivation, such as regularly sleeping less than the recommended 7–8 hours, has been associated with multiple health disorders and risks in cardiovascular, respiratory, neurological, gastrointestinal, immunological, dermatological, endocrine, and reproductive systems [
2,
3]. As per the national surveys [
4,
5], only 25 to 50% of adults in the United States slept the recommended 7–8 hours, and 20 to 35% reported consistent sleep difficulties. As per the international survey [
4], up to 40% of adults show sleep difficulties. These surveys indicate that poor sleep health is prevalent at both local and global scales. Due to the well-recognized importance of sleep and the high prevalence of its inadequacy, sleep monitoring has become an important research area.
A key metric to monitoring sleep is the spatial and temporal understanding of sleep postures through the night, as the postures directly influence sleep behavior and critical parameters [
6,
7,
8]. Each of us sleeps in one of the broad categories of posture, such as supine, lateral, fetal, and so on, and exhibits wide variations of them throughout the night [
9,
10]. The effect of different sleep postures has been studied widely to identify its relationship to different health conditions, such as brain activities [
11], nocturnal bruxism [
12], carpal tunnel syndrome [
13], progressive glaucoma [
14], motor development in infants [
15], back pain in adults [
16], waking cervical [
17], and scapular and arm pain [
18].
Specific sleep postures could also be fatal, depending on the pre-existing medical conditions. For example, supine posture is linked with exacerbating obstructive sleep apnea by creating unfavorable airway geometry, causing a reduction in lung volume and limiting the movement of airway dilator muscles, which could be life-threatening [
19]. Infrequent turns due to impairment in control of the motor activity of Parkinson’s patients lead to parasomnia and restless leg syndrome [
20]. Prone sleep posture is associated with nearly 73% of sudden unexpected deaths of Epilepsy patients [
21]. Infrequent changes in sleep posture are also the primary cause of pressure ulcers (i.e., bedsores) in post-surgical and elderly patients. Additionally, physicians recommend different sleep postures for different medical conditions: It is recommended to sleep on the side posture to reduce snoring, or left side to prevent heartburn, or supine posture to lower back or shoulder pain, or fetal posture during pregnancy, or some specific posture variations during post-surgery recovery [
22,
23,
24,
25].
These examples highlight the importance of a sleep posture monitoring system that can provide real spatio-temporal observations, which could not only help with corrections but also prevent fatal accidents.
Since it requires time to train a patient to adapt to a new sleep posture, physicians may need to frequently observe the fine-grained posture, such as skeletal information, and its changes throughout the night. Apart from error-prone qualitative assessment, where doctors ask patients (or their caretaker/partner) about their sleep postures, in-clinic quantitative assessment relies on visually observing the posture or inferring them by analyzing physiological signals from devices attached to a patient, such as Electrocardiography monitors, accelerometer, and so on [
26,
27,
28]. Such assessments are typically done in a hospital setting, requiring patients to stay there overnight [
29]. Recent works aimed to provide at-home monitoring [
27,
30,
31], but they mostly require users wearing sensors or placing the sensors on the bed. Wearables are not only cumbersome during sleep but also unreliable, since certain patients, such as the elderly, tend to forget to wear them. Pressure sensors attached to the bed mattress are an alternative solution [
7,
32,
33,
34], but they are costlier and often bring sleep discomfort. Contactless sleep monitoring, such as vision-based solutions, relies on optical and depth cameras to accurately monitor sleep postures, but they are highly privacy-invasive, and their performance is hindered by dark bedroom conditions and occlusion due to blanket [
35,
36,
37,
38]. Wireless-based solutions can overcome these challenges by inferring postures under no light without being privacy-invasive [
39,
40,
41], but existing solutions rely on special-purpose low-frequency devices. Besides, they can only classify sleep postures into broad, discrete categories [
42] and are unable to provide fine-grained posture information, such as the location of different body joints.
Fortunately, high-frequency
millimeter-wave (mmWave) wireless devices provide an effective alternative to the existing systems to enable fine-grained posture monitoring: MmWave signals can penetrate certain obstacles, work under zero visibility, and have higher-resolution than Wi-Fi. So, mmWave imaging can facilitate “seeing” the body posture under dark conditions and under the blanket. Besides,
mmWave transceivers are poised to soon become ubiquitous in all 5G-and-beyond smart home devices, such as routers and access points, enabling the opportunity for bringing privacy non-invasive human sleep posture monitoring system to the masses at home. However, there exist two fundamental challenges in mmWave imaging.
First, mmWave signals could be absorbed by many body parts or specularly reflect from them in different directions, away from the device, causing most signals to never reach back to the receiver [
43]. So, the output human shape could have a lot of missing parts from which it is difficult to infer joint locations.
Second, mmWave devices have extremely low resolution compared to vision-based systems; so, many high-frequency components, such as the contour and limbs, will be eliminated from the generated images [
44]. Moreover, the reflected signals carry additional information about the bed and surrounding objects close to the body, making it harder to separate the human shape. So, it is challenging to extract body joint information and changes directly from traditional mmWave imaging during sleep.
To overcome these challenges, we propose MiSleep, a single-person sleep posture monitoring system that leverages signal processing and deep learning models to enable fine-grained monitoring continuously and non-intrusively with commodity mmWave devices. Instead of generating a mmWave image from traditional algorithms and then predicting the body joint locations, MiSleep directly predicts the joint locations from the reflected mmWave signals by learning the hidden association between them from thousands of data samples. To learn such an association, MiSleep employs a customized Deep Convolutional Neural Network (DCNN) that predicts the 3D locations of several key body joints from the reflected signals captured by multiple mmWave antennas. The reflected signals carry amplitude and phase information at various azimuth angles, elevation angles, and depth, and the DCNN extracts relevant features from multiple spatially co-located antennas’ signals to formulate a mapping of the signals to the body joints. Furthermore, to generalize the model for diverse populations, MiSleep models a height classifier and uses the error in its prediction to fine-tune the model. We use a dataset collected from several static sleep postures from multiple volunteers, and at runtime, MiSleep can predict 3D joint locations directly from the mmWave signals. However, the reflected mmWave signals could be corrupted by various factors, such as the Doppler effect, under the toss-turn during sleep. For example, a person could turn from a lateral to a supine posture in the middle of the night. Predicting the body posture, with a model trained on static postures, during such sudden movements not only is challenging but also is less useful, since toss-turns usually span for a short duration of a few seconds. Therefore, MiSleep designs a toss-turn detection module that can first classify the sleeping states to either rest or toss-turn. Then, it predicts the joint locations only during the rest state.
We design and prototype
MiSleep with
Commercial-Off-The-Shelf (COTS) devices by building a customized setup with two 77–81 GHz mmWave transceivers [
45] to collect the reflected signals and a Microsoft Kinect Xbox One [
46] to collect the ground truth 3D joint locations. We collect reflected signals and ground truths of nearly 70k samples (
\(\gt\) 18.5 GB) from multiple experiments under different conditions with nine volunteers. The experimental results show that
MiSleep can detect all ground truth toss-turn events and can identify the start time and duration within 1.25 s and 1.7 s of the ground truth, respectively, for all cases.
For static sleep postures with a base model, MiSleep predicts the 3D location of body joints with a median error of 1.3 cm only. Furthermore,
MiSleep generalizes well for diverse volunteers with median and 90th percentile errors of 2.3 cm and 7.4 cm, respectively. Even when we test performance on unseen volunteers, we observe
MiSleep provides acceptable joint prediction. Furthermore,
MiSleep can identify the body postures and locate the joints under blankets with similar accuracies. Finally,
MiSleep can classify volunteers by leveraging their height information with an accuracy of 87% to 94% for eight volunteers and can classify five broad categories of postures with an accuracy above 90% for all volunteers.
In summary, we make these two contributions: (1) We design a customized deep learning framework for predicting the 3D location of body joints during sleep from COTS mmWave devices, which generalizes well for a diverse population. To the best of our knowledge, MiSleep is the first system to infer sleep postures in the form of 3D joint locations from the COTS mmWave device and achieve accuracy on par with the existing vision-based systems. (2) We design a toss-turn detection module that can accurately identify key sleep events and their timing information from the mmWave reflected signals. To accelerate the research on COTS mmWave device-based sleep monitoring, we will open-source our dataset and codebase.
6 Comparison with A State-of-the-art Work
Since our proposed approach is the first of its kind in using mmWave wireless signals for 3D joint estimation during sleep, our evaluation is limited to comparing it with other works that concentrate on activities such as standing or performing exercises. Specifically, we compare MiSleep with RF-Pose [85], a state-of-the-art technique in the RF domain that estimates human poses by generating skeletons from radio signals during standing postures. In RF-Pose , a student-teacher network approach is implemented, where the teacher network learns from RGB images and demonstrates high accuracy in estimating human pose. In our system, our ground truth pose information from Kinect’s depth images is highly accurate and, thus, acts as the output of teacher’s network to guide the student network. Similar to RF-Pose , the network learns to estimate the pose by computing and comparing the loss between the output of the teacher network (based on Kinect’s human pose) and the student network’s predictions. Based on the loss, the network adjusts its parameters to minimize the difference between output of teacher network and student network’s prediction during training. As the training progresses, the student network gradually improves its performance and aligns its pose estimation with the output of teacher. This iterative process of minimizing the loss helps the student network refine its pose-estimation capabilities and achieve higher accuracy over time.
We follow similar architecture as in RF-Pose , making some adjustments to match our specific input and output requirements. Like in RF-Pose , our student network follows an encoder-decoder design, but with modifications to accommodate the input from our horizontal and vertical mmWave devices, as shown in Figure 23. Our heatmaps are generated from the horizontal and vertical antenna array of our mmWave devices. The size of these heatmaps is 26 × 64. Since sleep is a static activity, we do not consider the temporal dimension and set it to 1. Consequently, our input to the encoder is represented as 1 × 26 × 64 × 1, indicating the number of channels × depth × height × width. This modified encoder-decoder architecture allows us to process the heatmap information from our mmWave devices effectively and estimate human pose accurately for sleep monitoring. Similar to the architecture used in RF-Pose , our encoder consists of 10 layers of spatio-temporal convolutions with a filter size of 9 × 5 × 5. Every other layer applies 1 × 2 × 2 strides on spatial dimensions. Batch normalization and ReLU activation functions are applied after each layer, and max-pooling is performed every second layer. The encoder takes the horizontal and vertical heatmaps as input and encodes them into a 32 × 2 × 1 × 1 tensor, separately for each heatmap. For the decoder, we utilize a combination of spatio-temporal convolutions and fractionally strided convolutions to decode the pose into a 21 × 3 tensor. The overall architecture is similar to RF-Pose , but we modify the output layer to generate the 21 × 3 pose information. To train the network, we use the mean-squared error as the loss function, which measures the difference between the ground truth pose and the generated pose. The network is trained using a batch size of 24, with the Adam optimizer and a learning rate of 0.001. The implementation is done in the PyTorch framework.
We compare the performance of MiSleep ’s Rest Network and RF-Pose in terms of the Euclidean distance between the ground truth and predicted joint locations in Figure 24. In Figure 24(a), we observe that RF-Pose predicts joints with median and 90th percentile errors of 4.04 cm and 12.14 cm, respectively, while our baseline result shows median and 90th percentile errors of 1.3 cm and 6.24 cm. Additionally, we analyze the prediction error for individual joints in Figure 24(b). We observe that for all 21 joints, the median error is always less than 6 cm, which is comparable to our baseline in Figure 15(a), where the median error is always less than 4 cm across all 21 joints. However, we notice that with RF-Pose , the standard deviation across all joints consistently reduces, indicating lower variability across data points. We observe that implementing RF-Pose’s architecture for our input yields results that are quite similar to our baseline, indicating that the joint predictions produced by MiSleep are comparable to those achieved by the state-of-the-art human joint prediction system.
7 Limitations and Future Works
Implementing MiSleep on mmWave Networking Devices: We have designed and prototyped
MiSleep on COTS mmWave Radars that operate on a 1.62 GHz bandwidth and can achieve a depth resolution of 9.26 cm. In the future, our goal is to deploy
MiSleep for at-home monitoring by reusing COTS mmWave networking devices, such as the 5G home wireless routers. We expect that
MiSleep will perform even better on COTS mmWave communication devices. This is because current standard mmWave networking devices, which follow IEEE 802.11ad protocol [
48], operate on a 2 GHz bandwidth; so it can achieve a better depth resolution of 7.5 cm. With higher depth resolution, a device can distinguish reflections from two different points with higher confidence, and it should be able to predict joint locations with higher accuracy. Also, these devices can already measure the reflected signals from the
Channel Estimation (CE) header field [
48], requiring no protocol or packet format modifications. But current COTS mmWave networking devices do not provide user access to the reflected signals yet without device firmware modifications. We propose to explore its feasibility in the future.
Extending MiSleep for Two Persons on a Bed:
MiSleep is currently designed and evaluated as a single person sleep posture monitoring system. Extending
MiSleep to monitor two persons simultaneously is very challenging.
First, when two persons share a bed, they lie in proximity to each other, so, the reflected signals from their bodies will be entangled with each other when they are received by
MiSleep. Given the depth resolution is 9.26 cm only, it might be hard to separate the mixed reflections.
Second, two persons may move and toss-turn at the same time, which makes it harder to identify and track the toss-turn of an individual based on signal cross-correlation.
One approach could be to track vital signs, such as breathing rate, heart rate, and so on, as two different individuals may show some distinct signatures, and then, we can improve spatial resolution with digital beamforming overall receive antennas. Existing works [
86,
87,
88,
89] can extract vital signs for multi-person or improve spatial resolution to enable multi-person tracking. But they are limited to human motion and have not been adapted for sleep posture monitoring, and we will investigate their feasibility and challenges in the future.
Long-term Sleep Monitoring and Field Deployment: In
MiSleep, we analyze two critical aspects of the sleep posture monitoring: (1) we identify the toss-turn events during sleep from reflected mmWave signals and classify the sleeping period to binary states; (2) we show that mmWave signals carry enough information and build and validate a model to predict the accurate 3D location of body joints. However, our observations and evaluations are limited to datasets collected for a short duration due to low buffer sizes in mmWave devices and Kinect image acquisition [
90], and they quickly fill up with high sampling rates (25 fps and 30 fps). Therefore, we were unable to collect data for a long duration and analyze the joint performance of
MiSleep’s toss-turn detector and sleep posture predictor. One option could be lowering the sample rates to 1 or 2 fps and/or modifying data collection software to enable real-time frame streaming to either a local PC or cloud. In the future, we will extend the data collection period for long-term monitoring and evaluate
MiSleep’s performance end-to-end.
8 Related Work
RF-based Sleep-posture Detection: Significant research efforts have been directed to understanding and inferring sleep posture using RF signals. These approaches use signals that bounce off the body of the person to determine the sleep posture. Reference [
91] embeds RFID tags in bedsheets and enables a contactless sleep-monitoring system by recognizing sleep postures using a machine learning classifier. Reference [
92] implements passive RFID tags under a bed sheet to provide a non-intrusive and comfortable way of monitoring the sleep postures. Reference [
39] uses signals from a Wi-Fi like device to identify the angular orientation of a person to determine sleep posture. Reference [
40] also uses Wi-Fi to exploit the fine-grained channel information to capture the minute movements caused by breathing and heartbeats. References [
41,
93,
94] adopt off-the-shelf commodity Wi-Fi devices to monitor sleep, but they mainly focus on predicting sleep stages and identifying breathing and motion patterns rather than sleep postures. However, all these works provide a coarse representation of the sleep and cannot identify the key body joint locations due to the low resolution offered by Wi-Fi-like devices. MmWave signals in ubiquitous commodity networking devices can enable such system by representing the human body at a fine-grain scale as compared to Wi-Fi. Yet, it is challenging to extract body joint information directly from traditional mmWave imaging during sleep.
Millimeter-wave-based 3D Joint Estimation: In recent years, researchers have been able to extract meaningful information about humans using mmWave wireless signals from commodity devices [
95,
96,
97]. In particular, extracting skeletal information has been the main focus, as it provides visual information about the 3D pose of a person [
95,
98,
99,
100,
101,
102,
103]. For example, Reference [
98] predicts the 3D location of 15 key body joints during human walking using a point-cloud representation of mmWave reflection points, and it can enable application in pedestrian detection for traffic monitoring.
Similarly, CLGNet estimates 3D location for 21 joints using a commodity mmWave device, where it uses a global spatial information encoding module to improve the detection performance of key points in minor body parts (such as hands or feet) and achieves an average accuracy of 0.29 m [100]. Furthermore, Reference [
101] incorporates forward kinematic constraints to learn 3D skeleton without suffering from corrupted pose estimations. The input is in the form of a range-doppler spectrum that intuitively helps in encoding the range and velocity of each joint, enabling the learning of spatial rotation angle required for incorporating kinematic constraints. All these existing works focus on human motion and have not been adapted for sleep posture monitoring. Unlike in human motion, where previous timestamp information about joints can be utilized to predict joint locations in the next frame, we cannot leverage this fact fully in sleep posture monitoring.
It is because, during sleep, a person is mostly in the rest state, and changes in postures take place abruptly.
MiSleep overcomes these challenges by designing a toss-turn detector and a static sleep posture predictor that detects a precise time of change in the posture and then predicts the postures during the rest states.
Sleep Behavior Assessment Using non-RF Modalities: Clinical approaches to monitoring sleep behavior include polysomnography, actigraphy, and ballistocardiography [
104,
105,
106]. However, they require a patient to sleep overnight or for many hours with many sensors. Sleep behavior monitoring can be categorized into three aspects: sleep quality, sleep pattern, and sleep posture. In
MiSleep, we focus on monitoring the sleep postures to provide visual information related to the 3D location of body joints during sleep. Accelerometer-based sleep posture monitoring system categorizes sleep posture based on accelerometer readings, where a person is required to wear a device and, based on the sensor reading, posture can be classified [
107,
108,
109]. Reference [
110] designs a textile-based sensor system that can be worn like a pajama to measure physiological signals and classify sleep postures. Pressure sensor-based approach, such as Reference [
34], uses thousands of tiny stretch sensors embedded in textiles on top of the bed mattress for sleep posture classification. Reference [
111] uses multimodal information from the pressure sensor and an RGB camera to classify postures. They rely on distributed pressure sensors or arrays of force-sensing resistors and measure the pressure exerted by the body during sleep [
33,
112,
113,
114]. The major drawback of these systems is that they require additional hardware either attached to a person or embedded in a bed to detect postures, both of which could be cumbersome and bring sleep discomfort. In contrast,
MiSleep enables contactless sleep posture monitoring by using the mmWave technology in ubiquitous 5G smart home devices.