1.exploring Unsupervised Machine Learning
1.exploring Unsupervised Machine Learning
Over the past decade, there has been a significant development in wearable health
technologies for diagnosis and monitoring, including application to stress monitoring.
Most of the wearable stress monitoring systems are built on a supervised learning
classification algorithm. These systems rely on the collection of sensor and reference
data during the development phase. One of the most challenging tasks in physiological
or pathological stress monitoring is the labeling of the physiological signals collected
during an experiment. Commonly, different types of self-reporting questionnaires are
used to label the perceived stress instances. These questionnaires only capture
stress levels at a specific point in time. Moreover, self-reporting is subjective and
Edited by: prone to inaccuracies. This paper explores the potential feasibility of unsupervised
Sundeep Singh,
learning clustering classifiers such as Affinity Propagation, Balanced Iterative Reducing
University of Calgary, Canada
and Clustering using Hierarchies (BIRCH), K-mean, Mini-Batch K-mean, Mean Shift,
Reviewed by:
Cesar Rodrigues, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Ordering
Federal University of Santa Points To Identify the Clustering Structure (OPTICS) for implementation in stress
Catarina, Brazil
Maryam Memar,
monitoring wearable devices. Traditional supervised machine learning (linear, ensembles,
Queensland University of trees, and neighboring models) classifiers require hand-crafted features and labels
Technology, Australia
while on the other hand, the unsupervised classifier does not require any labels of
*Correspondence:
perceived stress levels and performs classification based on clustering algorithms. The
Talha Iqbal
t.iqbal1@nuigalway.ie classification results of unsupervised machine learning classifiers are found comparable
to supervised machine learning classifiers on two publicly available datasets. The analysis
Specialty section: and results of this comparative study demonstrate the potential of unsupervised learning
This article was submitted to
Diagnostic and Therapeutic Devices, for the development of non-invasive, continuous, and robust detection and monitoring
a section of the journal of physiological and pathological stress.
Frontiers in Medical Technology
Keywords: machine learning, stress monitoring, physiological signals, heart rate, respiratory rate, unsupervised
Received: 24 September 2021
and supervised learning
Accepted: 10 February 2022
Published: 11 March 2022
Citation: INTRODUCTION
Iqbal T, Elahi A, Wijns W and
Shahzad A (2022) Exploring
There has been a notable increase in depression, anxiety, stress and other stress-related diseases,
Unsupervised Machine Learning
Classification Methods for
worldwide (1–3). Stress deteriorates the physical and mental well-being of a human. Particularly,
Physiological Stress Detection. chronic stress leads to a weakened immune system, substance addiction, diabetes, cancer,
Front. Med. Technol. 4:782756. stroke, and cardiovascular disease (4). Thus, it is of utmost importance to develop robust
doi: 10.3389/fmedt.2022.782756 techniques that can detect and monitor stress continuously, in real-time. The concept of
detecting stress is quite complex, as stress has physiological the use of unsupervised learning methods is relatively new in
as well as psychological aspects to it. Furthermore, both these the stress monitoring field. Rescio et al. (18) implemented the
aspects are triggered by multiple factors and are difficult k-means clustering algorithm for stress classification using heart
to capture (5). The recent development of wearable sensor rate (HR), galvanic skin response (EDA) and electrooculogram
technology has made it easier to collect different physiological (EOG) signals of 11 volunteers. To induce stress, the participants
parameters of stress in daily-life. were asked to perform a mental arithmetic task and complex
The use of psychological assessment questionnaires, filled out LEGO assembly without instruction. Authors have reported the
on different instances in a day, is the most common technique classification accuracy of 70.6% with heart rate, 74.6% with EDA
to determine human stress. These questionnaires are limited to and 63.7% with EOG used as a single variable unsupervised
capturing stress at a particular time and do not allow continuous classification model. Huysmans et al. (5) proposed a Self-
as well as real-time stress monitoring (6). The time-bound Organizing Maps (SOM) based mental stress detection model
nature of these questionnaire-based assessments unveils a major that uses skin conductance (SC) and the electrocardiogram
problem for the validation of new stress monitoring systems as (ECG) of the test subjects. The authors recruited a group of
there is no precise recording of which task or activity caused 12 subjects and asked them to complete three stress-related
the participants’ stress. To develop an acceptable standard for tasks (each of 2 min). The first task was the Stroop Word Color
continuous stress monitoring, Hovsepian et al. (7) used wearable test, in which subjects had to select the color of the word
devices and proposed a data-driven stress assessment model, rather than the written word. The second task was the mental
called the cstress model. To collect the data in this study, the arithmetic task, in which the subjects had to count backwards
participants were asked to fill out an Ecological Momentary from 1,081 with the difference of 7. The final task was to talk
Assessment (EMA) questionnaire 15 times a day, at random about a common stressful event that ever happened to them.
hours. The collected EMA self-report acted as the reference The authors reported the average test accuracy of 79.0% using
value for stress validation. The cstress model compensated for the proposed SOM based classifier. Ramos et al. (19) used Naïve
the unpredictable lag that occurred between the stressor and its Bayes and logistic regression models to classify the stress outside
logging in EMA self-report. the laboratory settings. They collected the heart rate, breathing
In the literature, several supervised learning algorithms have rate, skin temperature and acceleration data from 20 volunteers
been utilized for the detection and classification of stress (8–10). while they were performing physical activity (such as walking,
These machine learning algorithms include logistic regression, cycling, or sitting). To induce stress, the authors used random
Gaussian Naive Bayes, Decision Tree, Random Forest, AdaBoost, noises, verbal mathematical questions, and a cold-water test. The
K-Nearest Neighbors, and many others (4). Dalmeida and Masala activity data was ignored and an accuracy of 65% was achieved
(11) investigated the role of electrocardiograph (ECG) features by the authors. Maaoui et al. (17) investigated the use of three
derived from heart rate variation (HRV) for the assessment of unsupervised learning classification methods [K-mean, Gaussian
stress of drivers. A set of different supervised machine learning Mixture Model (GMM), and SOM] to determine the stress
algorithms were implemented, and the best recall score achieved levels using a low-cost webcam. Along with the webcam, the
was 80%. Similarly, Wang and Guo (12) combined the supervised authors also collected the heart rate (extracted seven attributes)
ensemble classifier with an unsupervised learning classifier and of 12 students volunteers. The authors reported the classification
used driver’s galvanic skin response (GSR) data to detect stress. error rate of the three algorithms as 13.05% (K-means),
Their proposed model was able to detect stress with an accuracy 44.04% (GMM) and 36.57% (SOM) classifier. Similarly, Fiorini
of 90.1%. et al. (20) compared the performance of three unsupervised
The physiological parameters that are frequently used for classification techniques (K-means, K-medoids, and SOM) with
stress analysis are respiratory rate, heart rate, skin conductance, three supervised learning techniques [Support Vector Machine
skin temperature, and galvanic skin response (13). As supervised (SVM), Decision Tree (DT), and K-nearest neighbors (K-NN)].
learning requires training labels for training the classifier, in They collected ECG, EDA, and electric brain activity signals of
most cases, either the labels are unavailable or inaccurate, in the 15 healthy individuals. The authors designed the study to induce
real-time data collection (14). Several studies have reported the three different emotional states (i.e., relax, positive, and negative)
challenges of labeling the stress states and the importance of by the means of social interaction. The reported classification
addressing these issues for the further development of sensor- accuracy for the best-performing unsupervised classifier (K-
based stress monitoring systems (15–17). The challenges of poor- means) was 77% while for the same model the best-performing
quality reference data and human bias encourage the exploration supervised classifier (K-NN) was 85%.
of unsupervised machine learning algorithms for stress detection This paper explores the possible use of unsupervised
and monitoring, as the unsupervised algorithms do not require classification methods for physiological stress detection.
reference data. To perform a comparative analysis of the performance of
unsupervised learning algorithms against supervised learning
algorithms, two publicly available datasets were used. A total
RELATED WORK; UNSUPERVISED of seven most common supervised and seven unsupervised
LEARNING CLASSIFICATION learning algorithms were implemented in Python Programming
Language. The implemented unsupervised algorithms are
Throughout the literature, most authors are dedicated to the use Affinity Propagation (21), Balanced Iterative Reducing and
of techniques based on supervised learning classification while Clustering using Hierarchies (BIRCH) (22), K-Mean, Mini-Batch
K-Mean (23), Mean Shift, Density-Based Spatial Clustering of Stress Recognition in Automobile Drivers Dataset
Applications with Noise (DBSCAN) (24) and Ordering Points The dataset is developed by Healey (28, 29) during her PhD
To Identify the Clustering Structure (OPTICS) (25). For program at MIT. The dataset consists of the electrocardiogram
comparison, supervised learning algorithms such as logistic (ECG), galvanic skin response (GSR), electromyogram (EMG),
regression, Gaussian naïve Bayes, decision tree, random forest, respiratory rate, and heart rate measured using wearable sensors
AdaBoost and K-nearest neighbors, are implemented. along with stress/non-stress labels generated from a combination
of questionnaires and captured videos of the drivers. A total
of 18 young drivers were asked to drive in different stress-
MATERIALS AND METHODS
inducing scenarios, such as at highways, rush hours and red
To address the challenge of manual annotation and labeling of lights, as well as a non-stress scenario (marked as non-stress or
the physiological signal as stress or non-stress in a supervised baseline readings). To rate the driver’s stress levels, three different
learning setup, we investigated the efficiency of the commonly methods were used. These methods included self-reporting
used unsupervised machine learning algorithms, illustrated in questionnaires, experimental design and metrics defined by
the literature. For assessment of the efficiency of these methods independent annotators based on the video recording of the
and comparative analysis, two publicly available datasets were drivers. The dataset has baseline reading along with three
downloaded. The first dataset is provided by the Massachusetts different stress level readings (low, medium, and high stress).
Institute of Technology (MIT), named Stress Recognition in
Automobile Drivers by Healey (26), and is available on Physionet, SWELL-KW Dataset
while the second dataset is called the SWELL-KW dataset, The SWELL-Knowledge Work (SWELL-KW) dataset (27)
available on Kaggle (27). Both the datasets contain heart provides heart rate variability (HRV) indices from sensor data for
rate variation features and provide labeled heart rate and stress monitoring in an office work environment. The experiment
respiratory rate parameters. The efficiencies of the supervised and was conducted on 25 subjects, performing typical office work
unsupervised learning algorithms were benchmarked and are such as preparing presentations, reading emails, and preparing
provided using standard measures of accuracy, precision, recall, work reports. Three different working conditions were defined
and F1-score matrices of each classifier. by the authors:
• Neutral/no-stress: the subjects were allowed to complete the
Performance Assessment Matrices given task with no time boundary.
The performance of the classifier is assessed using the • Time pressure (a stress condition): the time to complete the
following metrics: given task was reduced to 2/3 of the time the subject took in
the neutral condition.
• The accuracy of a classifier is defined as the percentage
• Interruption (a stress condition): during this time, subjects
of total correctly predicted labels in the test dataset, given
received 8 different emails. Some of the emails were related to
mathematically as (equation 1):
their task and were asked to take specific action while some
emails were not related to their task.
true positive labels + true negative labels
Accuracy = (1)
total readings The experiment recorded data of facial expression, computer
logging, skin conductance and ECG signal. For labeling, Rating
• The precision and recall are calculated using equations 2 and 3: Scale Mental Effort (RSME) (30) and Self-Assessment-Manikin
Scale (SAMS) (31) were used. Moreover, all subjects were also
true positive labels asked to report their perceived stress on a 10-point scale (from
Precision = (2) not-stressed to very stressed) using a visual analog scale.
true positive labels + false positive labels
true positive labels
Recall = (3) Unsupervised Classification Algorithms
true positive labels + false negative labels
Most of the unsupervised classification algorithms are based
on clustering algorithms. Clustering algorithms find best suited
• The F1-score of a classifier is the harmonic mean of its natural groups within the given feature space. In this study, the
precision and recall. Equation 4 shows how F1-score is sensor data for stress and non-stress states of the participants
calculated, mathematically: are considered as the feature vector. The most widely used
unsupervised classifiers implemented in this study are introduced
Precision ∗Recall in the following subsections.
F1 − Score = 2 ∗ (4)
Precision + Recall
Affinity Propagation
Data Collection Affinity propagation takes the input data points as a measure
To explore the usability of unsupervised machine learning of similarity between two data points. Each data point
classifiers in stress monitoring and comparison with supervised within the dataset sends a message to all other data points
learning methods, two publicly available datasets were about the target relative attractiveness. Once the sender
downloaded. Details of both datasets are described below. is associated with its target (stress/no-stress), the target
becomes an exemplar. All the points with similar exemplars K-Mean Classifier
are combined to form one cluster. The classifier finds a The K-mean classifier is one of the most frequently used,
set of different exemplars (representative points of each unsupervised learning classifiers. The algorithm assigns
cluster) that best summarizes the data points within the the group label to each data point to minimize the overall
dataset (21). variance of each cluster (23). The algorithm starts with
a random group of centroids, considering each centroid
as a cluster, and performs repetitive calculations to
BIRCH Classifier adjust the position of centroids. The algorithm stops the
Balanced Iterative Reducing and Clustering using Hierarchies optimization of clusters when the centroids are stable (no
(BIRCH) classifier constructs tree structure from which change in their values) or a defined number of iterations
classification cluster centroids are obtained. The BIRCH is achieved.
classification algorithm utilizes the tree structure to cluster the
input data. The tree structure is called a clustering feature tree
(CF Tree). Each node of the tree is made of a clustering feature Mini-Batch K-Mean Classifier
(CF). The BIRCH clusters multi-dimensional input data entities Mini-Batch K-mean classifier is a modified version of the K-mean
to produce the best number of clusters with the available memory classifier. The classifier clusters the dataset using mini-batches of
and time constraints. The algorithm typically finds good clusters the data points rather than using whole data. This classifier is also
within a single scan but can improve the quality with some robust to statistical noise and performs the classification of a large
additional scans (22). dataset more quickly (23).
Affinity propagation • Damping factor was set at 0.8 to maintain current value relative to incoming sklearn.cluster
10-fold cross
value (weight 1-damping)
validation
• Maximum iteration = 200
• Maximum number of iterations with no change in number of estimated
clusters = 15
BIRCH • Threshold from which the radius of subcluster should be lesser = 0.5 sklearn.cluster
• Number of clusters = length of unique ids in training set (default = 2)
DBSCAN • Maximum distance between two samples for consideration as neighbors sklearn.cluster
(eps) = 0.50
• Minimum samples in neighborhood of a point to consider it as core point = 9
• Distance calculation method = “eulidean”
K-mean • Number of neighbors required was set to 2 sklearn.cluster
Mini-batch K-mean • Number of neighbors required was set to 2 sklearn.cluster
Mean shift • Number of clusters = length of unique ids in training set (default = 2) sklearn.cluster
OPTICS • Maximum distance between two samples for consideration as neighbors (eps) sklearn.cluster
= 0.80
• Minimum samples in neighborhood of a point to consider it as core point = 10
Mean Shift Classifier AdaBoost and K-nearest neighbors for comparison of results
The mean shift classifier finds the underlying density function with the unsupervised classifiers. All these algorithms are briefly
and classifies the data based on the density distribution of the defined below. Interested readers are referred to Chaitra and
data points in feature space (32). The mean shift classification Kumar (33) for details.
algorithm tries to discover different blobs within a smooth
density of the given dataset. The algorithm updates the Logistic Regression Classifier
candidates for centroids that are then considered as the mean of Logistic Regression is one of the simplest machine learning
the points with the given region. These candidates are filtered algorithms mostly used for binary classification problems.
to eliminate near-duplicate centroids to form the final set of Logistic regression estimates and classifies based on the
centroids, that form the clusters. relationship between independent and dependent binary features
within a dataset.
DBSCAN Classifier
Density-Based Spatial Clustering of Applications with Noise Gaussian Naïve Bayes Classifier
(DBSCAN) finds the highest density areas in the given The Naive Bayesian classifier is a probabilistic classifier. Naive
feature domain and expands those areas, forming clusters Bayesian (NB) has only one parent node in its Directed acyclic
of feature space (stress/non-stress) (24). The DBSCAN finds graphs (DAGs), which is an unobserved node, and have many
neighborhoods of a data point exceeding a specified density children nodes, representing observed nodes. NB works with a
threshold. This threshold is defined by the minimum number strong assumption that all the child nodes are independent of
of data points required within a radius of the neighborhood their parent node and thus, one may say that Naïve Bayesian
(minPts) and the radius of the neighborhood (eps). Both the classifier is a type of estimator.
parameters are initialized manually at the start of the algorithm.
Decision Tree Classifier
OPTICS Classifier The Decision tree classifies by sorting input instances based on
Ordering Points To Identify the Clustering Structure (OPTICS) feature values. Each node of the decision tree shows a classified
is derived from the DBSCAN classifier, where a minimum of feature from an input instance while each branch shows an
samples are required as a hyper-parameter to classify the data as assumed nodal value. Classification of instances starting from the
a cluster (feature) (25). root and is sorted depending upon their feature values.
FIGURE 1 | Block diagram of the implemented classification methods illustrating pre-processing, classification, and post-processing stages.
TABLE 2A | Results of supervised learning algorithms on stress recognition in automobile drivers dataset.
Datasets Classifiers Feature Test-train split Classification accuracy Precision Recall F1-score
Stress recognition in Logistic regression Heart rate and 59.3% 0.59 0.59 0.59
automobile drivers dataset Gaussian Naive Bayes respiratory rate 56.5% 0.60 0.59 0.59
Decision tree 63.4% 0.64 0.64 0.63
Random forest 65.0% 0.65 0.66 0.65
AdaBoost 66.8% 0.67 0.66 0.65
KNN = 5 63.7% 0.63 0.63 0.63
KNN = 2 58.1% 0.60 0.57 0.56
Stress recognition in Logistic regression Heart rate 58.4% 0.59 0.58 0.58
automobile drivers dataset Gaussian Naive Bayes 56.0% 0.59 0.56 0.55
Decision tree 61.9% 0.66 0.062 0.57
Random forest 70-30 % 56.2% 0.56 0.56 0.56
AdaBoost 61.5% 0.61 0.61 0.60
KNN = 5 54.4% 0.54 0.54 0.54
KNN = 2 51.7% 0.55 0.52 0.50
Stress recognition in Logistic regression Respiratory 63.2% 0.70 0.63 0.55
automobile drivers dataset Gaussian Naive Bayes rate 63.4% 0.72 0.63 0.55
Decision tree 62.4% 0.64 0.62 0.63
Random forest 56.9% 0.57 0.57 0.57
AdaBoost 66.8% 0.66 0.67 0.67
KNN = 5 59.5% 0.59 0.60 0.59
KNN = 2 54.0% 0.58 0.54 0.53
TABLE 2B | Results of supervised learning algorithms on Stress recognition in automobile drivers dataset (K-fold cross validation).
Datasets Classifiers Feature Test-train Classification accuracy Standard deviation Confidence limits
split
Lower Upper
Stress recognition in Logistic regression Heart rate and 61.5% 0.038 58.8% 64.2%
automobile drivers dataset respiratory rate
Gaussian Naive Bayes 61.6% 0.022 58.9% 64.3%
Decision tree 64.1% 0.047 61.5% 66.8%
Random forest 64.0% 0.029 61.3% 66.6%
AdaBoost 65.6% 0.036 62.9% 68.2%
KNN = 2 54.9% 0.051 52.2% 57.6%
KNN = 5 58.6% 0.034 55.9% 61.3%
Stress recognition in Logistic regression Heart rate 58.7% 0.20 57.2% 60.2%
automobile drivers dataset
Gaussian Naive Bayes 56.4% 0.024 54.9% 57.9%
Decision tree 59.9% 0.019 58.4% 61.4%
Random forest 10-fold cross 57.5% 0.027 56.0% 59.0%
AdaBoost validation 59.9% 0.016 58.4% 61.4%
KNN = 5 52.0% 0.023 50.4% 53.5%
KNN = 5 56.1% 0.024 54.6% 57.6%
Stress recognition in Logistic regression Respiratory 58.3% 0.037 55.6% 61.0%
automobile drivers dataset rate
Gaussian Naive Bayes 58.7% 0.038 56.0% 61.4%
Decision tree 61.4% 0.053 58.7% 64.0%
Random forest 59.4% 0.50 56.7% 62.1%
AdaBoost 63.9% 0.036 61.2% 66.5%
KNN = 2 54.6% 0.039 51.9% 57.4%
KNN = 5 59.0% 0.052 56.3% 61.7%
like decision trees and the training method selected is always K-Nearest Neighbors Classifier
begging, as in begging learning models are linearly combined to The k-Nearest Neighbor (kNN) is one of the simplest instance-
increase the overall accuracy. While growing new trees, random based learning algorithms. Working of kNN is as follows. It
forest adds more randomness to the existing model. Instead of classifies all the proximity instances, in a database, into a single
finding the most important target feature for node splitting, this group and then when a new instance (feature) comes, the
algorithm searches for the best feature in the random subset of classifier observes the properties of the instance and places it
target features. In this way, we get wide diversity which in-return into the closest matched group (nearest neighbor). For accurate
results in a better model. So, as random forest only considers a classification, initializing a value to k is the most critical step in
random subset of features for splitting a node, we can make the the kNN classifier.
trees of the model more random by using random thresholding
of every feature rather than looking for the best threshold value.
TABLE 3 | Results of unsupervised learning algorithms on stress recognition in automobile drivers dataset.
Datasets Classifiers Feature Test-train split Classification accuracy Precision Recall F1-score
Stress recognition in Affinity propagation Heart rate and 63.8% 0.65 0.64 0.62
automobile drivers dataset respiratory rate
BIRCH 54.9% 0.62 0.57 0.50
DBSCAN 53.8% 0.56 0.54 0.41
K-mean 55.7% 0.62 0.56 0.52
Mini-batch K-mean 53.0% 0.28 0.53 0.37
Mean shift 53.0% 0.28 0.53 0.37
OPTICS 54.1% 0.54 0.54 0.53
Stress recognition in Affinity propagation Heart rate 59.7% 0.60 0.82 0.69
automobile drivers dataset BIRCH 49.1% 0.66 0.49 0.38
DBSCAN 54.7% 0.30 0.55 0.39
K-mean 70-30 % 55.5% 0.61 0.55 0.53
Mini-batch K-mean 54.8% 0.61 0.55 0.52
Mean shift 54.7% 0.30 0.55 0.39
OPTICS 51.6% 0.51 0.52 0.51
Stress recognition in Affinity propagation Respiratory rate 65.0% 0.77 0.65 0.57
automobile drivers dataset BIRCH 57.4% 0.33 0.57 0.42
DBSCAN 60.6% 0.62 0.61 0.53
K-mean 59.8% 0.63 0.60 0.60
Mini-batch K-mean 60.3% 0.6 0.60 0.60
Mean shift 57.4% 0.33 0.57 0.42
OPTICS 54.6% 0.49 0.55 0.46
Datasets Classifiers Feature Test-train split Classification accuracy Precision Recall F1-score
SWELL-KW dataset Logistic regression Heart rate 70-30 % 70.2% 0.70 0.70 0.64
Gaussian naive bayes 70.3% 0.70 0.70 0.64
Decision tree 74.8% 0.74 0.75 0.73
Random forest 74.8% 0.74 0.75 0.73
AdaBoost 74.6% 0.75 0.75 0.71
KNN = 5 71.8% 0.71 0.72 0.71
KNN = 2 62.7% 0.68 0.63 0.64
TABLE 4B | Results of supervised learning algorithms on SWELL-KW dataset (K-fold cross validation).
Lower Upper
SWELL-KW Logistic regression Heart rate 10-fold cross 70.2% 0.002 70.0% 70.4%
dataset validation
Gaussian Naive Bayes 70.3% 0.002 70.4% 70.5%
Decision tree 74.8% 0.002 74.6% 75.0%
Random forest 75.0% 0.003 74.8% 75.2%
AdaBoost 74.6% 0.003 74.4% 74.8%
KNN = 2 62.8% 0.002 62.6% 63.0%
KNN = 5 72.0% 0.003 71.8% 72.2%
f1-score, standard deviation, and 95% confidence intervals) are using combined heart rate and respiratory rate along with
calculated and reported. supervised learning algorithm is 66.8% (AdaBoost classifier)
The performance of unsupervised and supervised learning while for single feature model, i.e., heart rate and respiratory rate
algorithms was tested on the two datasets. The Stress Recognition separately, the highest classification accuracy is 61.9% (Decision
in Automobile Drivers Dataset was a smaller dataset with 4,129 Tree classifier) and 66.8% (AdaBoost classifier), respectively.
data points for each feature, i.e., heart rate and respiratory These results are better than previously reported accuracy values
rate, along with stress/non-stress labels. The SWELL-KW dataset (52.6 and 62.2% for heart rate and respiratory rate models)
was a relatively larger dataset with a total of 204,885 data (26). Similarly, when combined heart rate and respiratory
points for the heart rate feature along with stress/non-stress rate is used along with unsupervised learning classification,
conditions. Each data point is considered as a separate sample the highest classification accuracy achieved is 63.8% (Affinity
and is selected randomly for test and train sets, for supervised Propagation classifier). If a single feature model is used, the
learning classifiers. highest accuracy for the heart rate feature model becomes
In real-time, the unsupervised classifier is fed with control 59.7% while for the respiratory rate feature model, it is 65%
data and asked to classify the data into stress and non-stress using the Affinity Propagation classifier. K-fold cross-validation
condition. Then new data point is passed to the classifier (with k = 10) was also performed using supervised learning
and based on the centroids calculated using the control data, models. The highest achieved accuracies for a single feature
the new data point is placed in a specific cluster. For the model are 59.9% for heart rate and 63.9% for respiratory
comparison, a set of different supervised learning classifiers rate while two feature models (heart rate and respiratory rate
were also implemented, and the performance of the classifiers combined) gave an accuracy of 65.6%. Detailed analyses of
was evaluated using classification accuracies, precision, recall, different supervised and unsupervised learning algorithms are
and F1-scoring matrices. The results of the classifiers are illustrated in Tables 2A,B, 3.
discussed below.
SWELL-KW Dataset
Stress Recognition in Automobile Drivers The results of different supervised and unsupervised learning
Dataset algorithms using the SWELL-KW dataset are illustrated in
It is a well-known fact that all the traditional machine Tables 4A,B, 5. The highest classification accuracy achieved (with
learning classifiers are data-hungry. As the Stress Recognition 70-30% train-test split) using a supervised learning algorithm
in Automobile Drivers dataset is a smaller dataset, the highest is 74.8% (Decision Tree/Random Forest classifier), which is
classification accuracy achieved (with 70-30% train-test split) better than previously reported results for one physiological
Datasets Classifiers Feature Test-train split Classification accuracy Precision Recall F1-score
SWELL-KW dataset Affinity propagation Heart rate 70-30 % 66.5% 0.44 0.67 0.53
BIRCH 68.1% 0.66 0.68 0.60
K-mean 66.7% 0.45 0.67 0.53
Mini-batch K-mean 66.7% 0.45 0.67 0.53
Mean shift 68.3% 0.69 0.68 0.60
DBSCAN 66.7% 0.45 0.67 0.53
OPTICS 66.7% 0.45 0.67 0.53
TABLE 6 | Results comparison of supervised learning algorithms on datasets with previously reported work.
Datasets Classifier Ref Feature Highest reported Highest achieved Highest achieved
type classification classification accuracy classification accuracy
accuracy [this study] with 70-30% [this study] with K-fold
split validation
modality (accuracy = 64.1%) in (34) while for unsupervised physiological signal in the ambulatory environment is a
learning is 68.3% (Mean shift classifier). The overall classification difficult and inaccurate task. The results in Tables 2-5 show
accuracies of the supervised classifiers do not change significantly the comparison of classification efficiencies of supervised
with k-fold cross-validation applied to the data. The highest and unsupervised classification algorithms. The difference in
classification accuracy achieved using 10-fold validation is 75.0%. the highest classification accuracies is comparable, i.e., for
The other performance matrices, precision, recall, F1-score, Stress Recognition in Automobile Drivers dataset is 1% for
of both the datasets follow similar performance trends as the respiratory rate-based model and 3% for two feature-based
accuracy for comparison of algorithms. models. While for the SWELL-KW dataset, the difference is
6.5%. The overall accuracies of the supervised classifiers are
Summary better than the unsupervised classifier but as an unsupervised
The results of the supervised learning classification algorithm machine learning classifier does not require any intense pre-
are better than the previously reported results (26, 34) using training as well as stress/non-stress labels, these results are
the same datasets, see Table 6. As both datasets have real- encouraging the researchers to use the unsupervised models in
time physiological signals, there are some outliers and noisy stress monitoring wearable devices. Further improvements in
signals within the signal. Thus, intense pre-processing and outlier unsupervised algorithms to optimize use in stress monitoring can
detection was performed to cleanse the dataset for better training potentially provide even better detection accuracies.
of the classification algorithm. The achievement of better results
than the previously published results reflects that our performed CONCLUSION
pre-processing step (thresholding and filtering) does help in
developing a better classification model. Stress detection in a real-world environment is a complex
The authors acknowledge that these accuracies are not task as labeling of the collected physiological signals is often
indicative of good performance but motivate the researchers inaccurate or non-existing. The questionnaires and self-reports
to propose better supervised as well as unsupervised learning are considered the only established way of getting the reference
classification models for improved stress monitoring. Figure 2 state of the participant emotion. The supervised machine
shows the bar plot of classification accuracies of supervised and learning classifiers have been able to accurately classify the stress
unsupervised classification algorithms using Stress Recognition state from the non-stress state. The problem of stress level
in Automobile Drivers Dataset (Figure 2A) and SWELL-KW labeling has already been reported in many studies but has rarely
Dataset (Figure 2B). The use of an unsupervised classifier been addressed.
is important for the development of a non-invasive, robust, One possible solution is the use of an unsupervised
and continuous stress monitoring device since labeling the machine learning classifier as such algorithms do not
FIGURE 2 | Bar-plot of classification accuracies of supervised and unsupervised classification algorithms using (A) Stress Recognition in Automobile Drivers Dataset
and (B) SWELL-KW Dataset.
require labeled data. In this study, we have implemented in different stress monitoring scenarios. For comparison,
different unsupervised classification algorithms to explore the a set of different supervised learning algorithms was
feasibility of unsupervised stress detection and monitoring also implemented.
REFERENCES 11. Dalmeida KM, Masala GL. Hrv features as viable physiological
markers for stress detection using wearable devices. Sensors. (2021)
1. Richard L, Hurst T, Lee J. Lifetime exposure to abuse, current stressors, and 21:2873. doi: 10.3390/s21082873
health in federally qualified health center patients. J Hum Behav Soc Environ. 12. Wang K, Guo P. An ensemble classification model with unsupervised
(2019) 29:593–607. doi: 10.1080/10911359.2019.1573711 representation learning for driving stress recognition using
2. Hemmingsson E. Early childhood obesity risk factors: socioeconomic physiological signals. IEEE Trans Intell Transp Syst. (2020)
adversity, family dysfunction, offspring distress, and junk food self- 22:3303–15. doi: 10.1109/TITS.2020.2980555
medication. Curr Obes Rep. (2018) 7:204–9. doi: 10.1007/s13679-018- 13. Iqbal T, Redon-Lurbe P, Simpkin AJ, Elahi A, Ganly S, Wijns W,
0310-2 et al. A sensitivity analysis of biophysiological responses of stress for
3. Everly GS, Lating JM. The anatomy and physiology of the human stress wearable sensors in connected health. IEEE Access. (2021) 9:93567–
response. In: A Clinical Guide to the Treatment of the Human Stress Response. 79. doi: 10.1109/ACCESS.2021.3082423
New York, NY: Springer (2019). p. 19–56. 14. Vildjiounaite E, Kallio J, Mäntyjärvi J, Kyllönen V, Lindholm M, Gimel’farb G.
4. Iqbal T, Elahi A, Redon P, Vazquez P, Wijns W, Shahzad A. A Unsupervised stress detection algorithm and experiments with real life data.
review of biophysiological and biochemical indicators of stress In: EPIA Conference on Artificial Intelligence. Porto (2017). p. 95–107.
for connected and preventive healthcare. Diagnostics. (2021) 15. Larradet F, Niewiadomski R, Barresi G, Caldwell DG, Mattos LS. Toward
11:556. doi: 10.3390/diagnostics11030556 emotion recognition from physiological signals in the wild: approaching
5. Huysmans D, Smets E, De Raedt W, Van Hoof C, Bogaerts K, the methodological issues in real-life data collection. Front Psychol. (2020)
Van Diest I, et al. Unsupervised learning for mental stress detection- 11:1111. doi: 10.3389/fpsyg.2020.01111
exploration of self-organizing maps. Proc Biosignals. (2018) (2018) 4:26– 16. Adams P, Rabbi M, Rahman T, Matthews M, Voida A, Gay G, et al. Towards
35. doi: 10.5220/0006541100260035 personal stress informatics: comparing minimally invasive techniques for
6. Li R, Liu Z. Stress detection using deep neural networks. BMC Med Inform measuring daily stress in the wild. In: Proceedings of the 8th International
Decis Mak. (2020) 20:1–10. doi: 10.1186/s12911-020-01299-4 Conference on Pervasive Computing Technologies for Healthcare. Oldenburg
7. Hovsepian K, Al’Absi M, Ertin E, Kamarck T, Nakajima M, Kumar S. cStress: (2014). p. 72–9.
towards a gold standard for continuous stress assessment in the mobile 17. Maaoui C, Pruski A. Unsupervised stress detection from remote physiological
environment. In: Proceedings of the 2015 ACM International Joint Conference signal. In: 2018 IEEE International Conference on Industrial Technology
on Pervasive and Ubiquitous Computing. Osaka (2015) p. 493–504. (ICIT). Lyon (2018). p. 1538–43.
8. McDonald AD, Sasangohar F, Jatav A, Rao AH. Continuous monitoring and 18. Rescioa G, Leonea A, Sicilianoa P. Unsupervised-based framework for aged
detection of post-traumatic stress disorder (PTSD) triggers among veterans: worker’s stress detection. Work Artif Intell an Ageing Soc. (2020) 2804:81–7.
a supervised machine learning approach. IISE Trans Healthc Syst Eng. (2019) Available online at: http://ceur-ws.org/Vol-2804/short3.pdf
9:201–11. doi: 10.1080/24725579.2019.1583703 19. Ramos J, Hong J-H, Dey AK. Stress recognition-a step outside the lab. In:
9. Leightley D, Williamson V, Darby J, Fear NT. Identifying probable PhyCS. Lisbon (2014). p. 107–18.
post-traumatic stress disorder: applying supervised machine learning 20. Fiorini L, Mancioppi G, Semeraro F, Fujita H, Cavallo F. Unsupervised
to data from a UK military cohort. J Ment Heal. (2019) 28:34– emotional state classification through physiological parameters
41. doi: 10.1080/09638237.2018.1521946 for social robotics applications. Knowledge-Based Syst. (2020)
10. Schmidt P, Duerichen R, Van Laerhoven K, Marberger C, Reiss A. Introducing 190:105217. doi: 10.1016/j.knosys.2019.105217
WESAD, a multimodal dataset for wearable stress and affect detection. 21. Frey BJ, Dueck D. Clustering by passing messages between
In: Proceedings of the 20th ACM International Conference on Multimodal data points. Science. (2007) 315:972–6. doi: 10.1126/science.11
Interaction. Boulder, CO (2018) 400–8. doi: 10.1145/3242969.3242985 36800
22. Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering 31. Bynion T-M, Feldner MT. Self-assessment manikin. Encycl Personal Individ
method for very large databases. ACM Sigmod Rec. (1996) 25:103– Differ. (2020) 4654–6. doi: 10.1007/978-3-319-24612-3_77
14. doi: 10.1145/235968.233324 32. Comaniciu D, Meer P. Mean shift: a robust approach toward feature
23. Sculley D. Web-scale k-means clustering. In: Proceedings of the 19th space analysis. IEEE Trans Pattern Anal Mach Intell. (2002) 24:603–
International Conference on World Wide Web. New York, NY (2010). p. 1177– 19. doi: 10.1109/34.1000236
8. 33. Chaitra PC, Kumar RS. A review of multi-class classification algorithms. Int J
24. Ester M, Kriegel H-P, Sander J, Xu X, others. A density-based algorithm for Pure Appl Math. (2018) 118:17–26. Available online at: https://www.acadpubl.
discovering clusters in large spatial databases with noise. In: kdd. Portland eu/jsi/2018-118-14-15/articles/14/3.pdf
(1996). p. 226–31. 34. Koldijk S, Neerincx MA, Kraaij W. Detecting work stress in offices by
25. Ankerst M, Breunig MM, Kriegel H-P, Sander J. OPTICS: ordering points combining unobtrusive sensors. IEEE Trans Affect Comput. (2016) 9:227–
to identify the clustering structure. ACM Sigmod Rec. (1999) 28:49– 39. doi: 10.1109/TAFFC.2016.2610975
60. doi: 10.1145/304181.304187
26. Healey JA. Wearable and Automotive Systems for Affect Recognition From Conflict of Interest: The authors declare that the research was conducted in the
Physiology. Cambridge, MA: Massachusetts Institute of Technology (2000). absence of any commercial or financial relationships that could be construed as a
27. Koldijk S, Sappelli M, Verberne S, Neerincx MA, Kraaij W. The swell potential conflict of interest.
knowledge work dataset for stress and user modeling research. In: Proceedings
of the 16th International Conference on Multimodal Interaction. Istanbul Publisher’s Note: All claims expressed in this article are solely those of the authors
(2014). p. 291–8. and do not necessarily represent those of their affiliated organizations, or those of
28. Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov
the publisher, the editors and the reviewers. Any product that may be evaluated in
PC, Mark RG, et al. PhysioBank, physiotoolkit, and physionet:
this article, or claim that may be made by its manufacturer, is not guaranteed or
components of a new research resource for complex physiologic
signals. Circulation. (2000) 101:e215-20. doi: 10.1161/01.CIR.101.2 endorsed by the publisher.
3.e215
29. Healey JA, Picard RW. Detecting stress during real-world Copyright © 2022 Iqbal, Elahi, Wijns and Shahzad. This is an open-access article
driving tasks using physiological sensors. IEEE Trans Intell distributed under the terms of the Creative Commons Attribution License (CC BY).
Transp Syst. (2005) 6:156–66. doi: 10.1109/TITS.2005. The use, distribution or reproduction in other forums is permitted, provided the
848368 original author(s) and the copyright owner(s) are credited and that the original
30. Widyanti A, Johnson A, de Waard D. Adaptation of the rating scale publication in this journal is cited, in accordance with accepted academic practice.
mental effort (RSME) for use in Indonesia. Int J Ind Ergon. (2013) 43:70– No use, distribution or reproduction is permitted which does not comply with these
6. doi: 10.1016/j.ergon.2012.11.003 terms.