In this section we describe various parts of the MHDeep framework. First, we give an overview of our approach. Then, we discuss the data collection and preparation process, synthetic data generation, and grow-and-prune DNN synthesis.
3.2 Data Collection and Preparation
We collected WMS data from a total of 74 adult participants at the Hackensack Meridian Health Carrier Clinic, Belle Mead, New Jersey. The participants were diagnosed by medical professionals at the clinic. The 74 participants comprised the following four categories: 25 healthy participants (no mental health disorder), 23 participants with bipolar disorder, 10 participants with major depressive disorder, and 16 participants with schizoaffective disorder. The experimental procedure for data collection and analysis was approved by the Institutional Review Board of Princeton University. The physiological signals of the participants were captured by a commercially available Empatica E4 smartwatch [
1] and a Samsung Galaxy S4 smartphone, as shown in Figure
3. First, we collected data from an extensive range of sensors embedded in both the smartwatch and smartphone. We analyzed the mean value and standard deviation of collected data from each sensor for different cohorts. We identified a final set of eight sensors as being the most informative in terms of distinguishing between these four cohorts. Table
1 summarizes the final set of data types that we used in this study. The physiological signals are derived from WMSs embedded in the smartwatch. They include GSR that measures sympathetic nervous system arousal, IBI that indicates the heart rate, ST that provides skin temperature readings, and three-axis accelerometer (Acc-W) that measures acceleration in the
\(x\),
\(y\), and
\(z\) directions. The information collected from these sensors is useful for detecting various mental disorders. For example, the electrodermal response can be used as a feature to detect patients affected by depression disorder or to detect different mood disorders [
47]. Bipolar disorder is associated with cardiac autonomic dysregulation [
9] that has an impact on IBI. Skin temperature can be used as a feature to detect stress [
28] and bipolar [
39] disorders. In addition to the physiological signals, ambient and motion information is also captured using sensors in the smartphone. These include ambient temperature (Temp), gravity (Grav), acceleration (Acc-P), and angular velocity (Vel). The motion and ambient information may also be informative in detecting the mental state of the user. For example, Berle et al. [
6] show that motor activities of schizophrenic and depressed patients are significantly reduced. In addition to the motion information, ambient temperature can also impact the mental state of the individual and affect the severity of the mental disorder. Mullins and White [
38] argue that cold temperatures can reduce the adverse effect of mental disorders, whereas hot temperatures can exacerbate them. In addition, it is worth mentioning that the acceleration sensors in the smartphone and smartwatch have different sampling rates and capture different motion information.
Before data collection, all participants are informed about the experiment and are asked to sign a consent form. The data collection setup consists of placing the Empatica E4 smartwatch on the wrist of the participant’s non-dominant hand and placing the Samsung Galaxy S4 smartphone in the opposite front pocket. For all participants, we maintain the same orientation for the phone. Data collection lasts around 1.5 hours, during which time the participant is allowed to freely move around in the room with their on-body devices. During this time, the smartwatch and smartphone continuously record and store physiological signals and ambient/motion information. At the end of the data collection period, we remove the smartwatch from the patient’s wrist and the smartphone from their pocket. We use the Empatica E4 Connect portal for smartwatch data retrieval. We use a private Android application to download the smartphone data streams. All of the recorded data are timestamped at the time of sampling.
Next, we preprocess the dataset for use in DNN training. We first synchronize the smartwatch and smartphone data streams for each participant. This is necessary because the WMS data streams may vary in their start times and frequencies. Then, we divide the data for each participant into 15-second windows. This window size was chosen based on experiments with the validation set, as discussed later. Each 15-second window of the combined smartwatch/smartphone data constitutes one data instance. There is no time overlap between data instances. To obtain each data instance, we flatten and concatenate the data within the same time window from both the smartwatch and smartphone. This results in a feature space of dimension 2,325. The smartwatch (smartphone) contributes 1,575 (750) features. All the smartphone sensors have a sampling rate of 5Hz. In addition, the smartwatch sensors include one data stream at 32Hz, two data streams at 4Hz, and one data stream at 1Hz.
Because the participants are in a room during the data collection process, they do not enjoy a wide range of motions, and hence the higher sampling rates for the smartphone sensors are not needed. In addition, the Empatica E4 used for data collection is a medical-grade smartwatch that is designed to capture various physiological signals with their optimal sampling rates. Although collecting data from more sensors at higher sampling rates may provide more information, unnecessarily high sampling rates can lead to a decrease in the battery life of the device. In addition, by targeting a window of 15 seconds for each data instance, we can remedy the low sampling frequency of some of the sensors by considering multiple sensor readings in each data instance.
For each classification task, because the number of individuals in each of the four categories (healthy and three disorders) is small, we created three different data partitions for evaluation. We used circular shifts on a list of numbers denoting patients in each category to create these three data partitions. The value of the stride used for the circular shift is equal to the number of test individuals in each group (as explained later). The data instances extracted from the individuals in each of the four groups (healthy, schizoaffective, depressive, and bipolar) were divided into three sets: training, validation, and test. To evaluate the models on different unseen patients, data instances included in the training, validation, and test sets came from different individuals; i.e., no individual contributed data to more than one of these sets. Among the healthy participants, for each of the three data partitions, data instances from 15 individuals (60% of the healthy participants) are selected for the training set, from 5 individuals (20% of the healthy participants) for the validation set, and from the remaining 5 individuals (20% of the healthy participants) for the test set. For individuals with bipolar disorder, the training, validation, and test sets contain data instances from 13, 5, and 5 participants, respectively. Among the participants who had major depressive disorder, data instances from 6 participants are selected for the training set and from 2 participants each for the validation and test sets. For individuals with schizoaffective disorder, the training, validation, and test sets include data instances from 10, 3, and 3 participants, respectively.
We create the final dataset for each binary classification task (healthy vs. the mental health disorder) by combining the training, validation, and test sets of the two classes involved in that task. We use SMOTE [
10] to up-sample data instances from the minority class. The up-sampling is applied only to the training set. Table
2 shows the number of instances for each classification task for all three data partitions.