Bakker Hacdais2011
Bakker Hacdais2011
Bakker Hacdais2011
I. I NTRODUCTION
Stress at work has become a serious problem affecting many
Fig. 1. Stress@work in a nutshell: stress detection, prediction and coaching
people of different professions, life situations, and age groups.
The workplace has changed dramatically due to globalization
of the economy, use of new information and communica-
tions technologies, growing diversity in the workplace, and There are a number of factors that are likely to cause stress
increased mental workload. In the 2000 European Working at work including but not limited to long work hours, work
Conditions Survey (EWCS) [12], work-related stress was overload, time pressure, difficult, demanding or complex tasks,
found to be the second most common work-related health high responsibility, lack of breaks, conflicts, underpromotion,
problem across the EU. 62% of Americans say work has lack of training, job insecurity, lack of variety, and poor phys-
a significant impact on stress levels. 54% of employees are ical work conditions (limited space, inconvenient temperature,
concerned about health problems caused by stress. One in four limited or inappropriate lighting conditions) [10].
employees has taken a mental health day off from work to cope In [1] we proposed the conceptual framework for managing
with stress (APA Survey 2004). stress at work. One very important step in the process of stress
Stress can contribute to illness directly, through its phys- management is making the worker aware of the past, current or
iological effects, or indirectly, through maladaptive health expected stress. We aim at the automation of the identification
behaviors (for example, smoking, poor eating habits or lack of the stress causes of an employee in question, as well as the
of sleep) [4]. It is important to motivate people to adjust identification of the common causes of stress for employees
their behavior and life style and start using appropriate stress within an organisation. Figure 1 shows the main ideas of our
coping strategies. So that they achieve a better stress balance approach: We aim at making stress and stressors visible by
far before increased level of stress results in serious health (1) keeping track of the calendar events and daily routine of the
problems. worker, (2) measuring stress-related physiological signs from
The avoidance of stress in the everyday working environ- the sensor data, (3) annotating these events with the sensor data
ment is impossible. Still, if people are informed of their stress and the results of automated analysis of additional information
levels, they become empowered for taking some preemptive sources, such as sentiment classification of the incoming and
actions in order to alleviate stress [16]. outgoing e-mails or social media messages [18] and explicit
Sympathetic system
Heartrate
further work.
II. ACUTE STRESS IDENTIFICATION
Sweat production Stress comes in three flavors:
1) Acute: stress caused by an acute short-term stress factor.
2) Episodic acute: acute stress that occurs more frequently
Other and/or periodically.
3) Chronic: stress caused by long-term stress factors and
Other factors can be very harmful in long run.
Most people experience acute stress during their everyday
Fig. 2. The reaction to stress factors is governed by the autonomous nervous life. It is a primal flight-or-fight response to immediate stress
system. This path is shared with a lot of other mechanisms. factors and is not considered harmful. When the frequency
of these occurrences increase, physiological symptoms might
occur. This type of stress is associated with a very busy and
user feedback, (4) extracting the relationship between event chaotic life and can be considered to be harmful when it occurs
data and sensor data, i.e. relations between the increases and over prolonged periods of time. The last type of stress, chronic,
decreases in the stress level with the characteristics of the is considered to be the most harmful. Prolonged periods of
events of daily lives (what, where, when, with whom, etc.), stress could be caused by personal circumstances or other
and (5) using extracted knowledge about this relationship for long-term factors.
personalized coaching. In our work, we want to prevent people from transferring
In order to find this relationship, a number of subtasks to the chronic category and therefore, we target the acute and
need to be done. One of the main subtasks is detecting episodic acute stress. Particularly, in this paper we focus on
stress from the sensor data. Due to modern ICT and sensor the identification of acute stress in order to facilitate coaching
technologies, objective measuring of the stress level in non- of the episodic acute stress.
lab settings becomes possible. Such symptoms as voice, heart Acute stress is a mechanism that brings the body into a
rate, galvanic skin response (GSR) and facial expressions are state of alertness. As shown in Figure 2, it is controlled by the
known to be highly correlated with the level of stress a person autonomous nervous system. This system maintains a constant
experiences [3], [5], [7]. In this paper we focus on the use of equilibrium (also known as homeostasis). A change in this
the GSR data (reflecting sweating) measured by a prototype equilibrium results in different changes in the bodily functions
device worn at a wrist. (e.g. activity of digestion system).
The direct use of the GSR measurements obtained is not that Stress can be seen as a state of emergency that is preceded
straightforward. Partly this is caused by noise and inaccuracies by arousal due to an external stimulus, see Figure 3. After the
in the collected sensor data, but what is more crucial – the re- factor causing stress (the stressor) disappears, the body relaxes
action to various stress factors is governed by the autonomous and returns to a normal state.
nervous system and this “path” to the symptomatic system is Figure 4 shows the general case with more relationships
shared with a lot of other mechanisms, such as the mechanism between the four states depicting the inner process of stress.
of adaption to the outside temperature and humidity (Figure 2). The problem of stress identification can be formulated in
We have conducted a pilot case study aimed at the identi- different ways, e.g. as a traditional classification task, as one-
fication of likely challenges we need to address to make our class classification, as event identification, and as time series
approach work in practice. In this paper, we focus only on subsequence classification to name a few main options.
the problem of detecting changes in the stress level from the It should be also noticed that acute stress can also be pos-
GSR sensor data alone. We study the peculiarities of noise and itive (e.g. caused by an excitement or an intrinsic motivation
disturbances in the signal and argue the need of the related or an engagement in the working process), and, consequently,
contextual data for improving the quality of stress detection. staying in a normal state for too a long period without any
The rest of this paper is organized as follows. In Section II, acute stress can be a sign of monotone uninteresting work or
we formulate the problem of stress identification and cate- poor motivation of the employee. Therefore, we would like
gorization from the sensor data stream mining perspective. to perform a more detailed classification of the states in the
We focus on a subproblem of arousal identification in online future.
settings, which we formulate as a drift detection task. We In this paper we consider a simplified setting assuming
highlight the major problems of dealing with GSR data, col- that a person is either in the normal state or in a stressed
lected from a watch-style stress measurement device in normal state. The change between the two states can be sudden or
(i.e. in non-lab) settings, and propose simple approaches how incremental; typically, arousal is more rapid and relaxation
574
25
20 Noise
15
GSR
10
0
12
Time, hours
Fig. 5. The GSR signal contains two-sided local noise peaks that are probably
caused by a physical disturbance of the contact between the skin and the
sensors, e.g. if someone has a habit to touch from time to time the watch or
Normal Aroused Stressed Relaxing the stress meter in this case.
20 Gaps
Fig. 3. An example of acute stress pattern observed from GSR data and how 15
GSR
it can be mapped to the symbolic (time-stamped) representation of person’s
10
stress.
5
0
Arousing 14 15
Time, hours
Normal Stressed
Relaxing Fig. 6. When the fit between the skin and the sensors is not tight enough,
the contact is continuously broken. A characteristic of this behavior is the
high amount of gaps (ground value of sensor) in the signal.
Fig. 4. Four states depicting the inner process of stress.
creates noise in the signal in the form of gaps (see Figure 5).
takes considerably longer. As we will show, different change Note that the skin in contact with the device contains
patterns can be observed. slightly more sweat than the skin next to the device, and when
the device is shifted on the skin, there is a resettling period of
A. Arousal as change detection about 15 minutes during which the skin that came in contact
The principal task is to detect whether a person is stressed with the device gets about the same level of sweat as the
at a particular moment in time or not. In other words, the skin that was in contact with the device before the shift, thus
detector assigns a label “stressed” or “not stressed” based on resulting in about the same GSR (under assumption that no
the observed historic data. change in the stress level happens in this period).
Detecting changes in GSR data is not as straightforward
as someone might think looking at the example in Figure 3. Importance of context. There are a lot of different factors
Different types of noise in the data and changes in GSR data that influence the internal state of a person. Rising GSR levels
due to other factors than stressors make it a non-trivial task. In might be related to a rise in temperature or to heavy physical
this section, we give illustrative examples of noise and other work or exercises. In other words, the GSR change patterns
factors affecting the GSR signal. can be related to contexts that are mostly hidden.
due to a poor fit, we get noise in the signal (see Figure 6).
A person might also accidently touch the device (or do this Fig. 7. Prior to a stressful event (red-lined peak), the GSR level is gradually
periodically in case of having such a habit), thus increasing rising. Is this rise caused by an external factor or is it due to anticipation of
the event?
the pressure and influencing the GSR measurement; this also
575
14
2
12
1.5 10 Excercise start Excercise end
GSR
GSR
1 8
6
0.5
4
0
11 12 13 14 15 16 17 18 19 20 2
Time, hours
10 11 12 13 14 15 16 17
Time, hours
Fig. 8. After a stressful event (red-lined peak) the GSR level does not return
to the level it had prior to the event. This might indicate that there is no
relaxation process. Fig. 10. Doing physical exercises results in a high GSR level, yet is not
related to the emotional stress.
0.5
tcurr
0 y=
10 11 12 13
Time, hours
14 15 16 17
( y t .. y t ) y’ = f ( y ) y’’ = g ( y’) SAX(y’’)
c curr
!
Fig. 9. After a suspected stressful event the GSR level does not return to n m
the level it had prior to the event. This might be an indicate that there is no tc
relaxation process or what is more like in this case - the baseline level of
GSR corresponding to normal unstressed state changed.
Fig. 11. Arousal detection approach: the GSR data is first (1) filtered,
(2) aggregated, and (3) discretized in the preprocessing phase and then passed
to a change detection technique. Each step is applied to a window of data
that is kept until a change has been detected.
One of these patterns is a steady increase of the GSR
level (see Figure 7). This might be an indication of changing
environmental factors (e.g. temperature), but it might also be a
genuine stress response. For instance, once a certain event has B. Approach
been scheduled, the person might get stressed in anticipation of
the event. This is an interesting pattern for the stress detection The main task is to determine whether the observed portion
task. of the signal contains a change that corresponds to an arousal.
The same holds for the patterns in Figures 8 and 9. In these Formulating this problem as a change detection task on
time series there is a suspected stress peak: in Figure 8 the univariate time series, we consider a four step approach for
red part corresponds to an event tagged by the user as being arousal detection as shown in Figure 11. In the preprocessing
stressful, in Figure 9 there is an untagged short-term increase phase we take the raw GSR sensor data and according to the
in the GSR level. In both cases, the GSR level does not return operational settings (i.e. offline vs. online) perform its filtering,
to the original baseline after passing the peaks. The question aggregation and discretisation. The processed data is served to
is whether this is due to continuous stress (because of the a change detection technique.
user being still busy with what has happened) or some other The purpose of arousal detection can be twofold. The first
factors. is to obtain labels for the supervised learning process aimed at
For some series we learnt from the users’ feedback that finding relationships between stress occurrences and external
certain patterns were caused by environmental factors or user events of factors causing stress. In this case we can perform
activity context. In Figure 10 the person is exercising between change detection in offline settings, i.e. the complete data
12:00hr and 13:00hr. The effect of the exercises is clearly series can be used in preprocessing and detection steps. The
visible in the GSR time series. Moreover, due to the form second purpose is to use an online detection mechanism in
and the intensity of the picks, we can discriminate those from online or semi-online settings as an alarm for making the user
genuine stress. aware of stress (and possibly asking for feedback that can be
These context-dependent patterns will be important in the related back to the subjective labeling process, i.e. the user
overall stress detection task. Knowing whether a person relaxes can confirm or reject the alert). Although we do not fix the
after a stressful event or whether he or she experiences purpose of the task in this paper, we only describe an online
anticipating stress is very important. Here we do not handle method that detects arousal for the point in time that might be
these contexts explicitly. as much as a minute in the past.
576
Preprocessing. The three preprocessing steps that we use 1.4
1.2
1
GSR
0.8
GSR
0.6
0.5
purposes in Figure 12. The main objective of the preprocessing 0.4
0.2
0
phase is to remove noise from the GSR time series. The first
0
10 11 12 13 14 15 16 17 09 10 11 12 13 14 15 16 17 18
Time, hours Time, hours
type of noise is due to poor contact between the sensors and (a) Raw GSR signal (b) Filtered GSR signal
the skin (see Figure 6). If the contact is not sufficient, the 1.4
7
sensor will not measure anything. The second type of noise is 1.2
1
5
GSR (SAX)
4
GSR
0.8
a local disturbance of the signal (see Figure 5). These local 0.6
0.4
3
0
0 50 100 150 200 250 300 350 400 450 500
1
0
0 50 100 150 200 250 300 350 400 450 500
Time, minutes Time, minutes
GSR
3
other points. 2
task is to catch the transition from normal GSR levels to Fig. 13. An illustration that discretising the data with SAX does not
aroused levels. This transition is characterized (for a typical immediately give us information about a change in arousal, e.g. by taking
stress pattern) by a sudden peak in the GSR level. The filter a maximal value of the current window. The blue circles indicate the changes
alerted by such an approach. The red triangles indicate the change points
should filter out local disturbances while maintaining the alerted by the ADWIN change detection method taking the SAXified time-
typical peaks. Therefore, the noise is filtered out by using series as an input. ADWIN is considered below.
a median filter [14]. This is a filter that is used in image
processing, and it preserves edges (opposed to e.g. a moving
average) while filtering out noise. this setup it is important that the aggregation step is applied
The preprocessing step is applied to windows within the after the filter in order to avoid the influence of local noise.
window of kept data. Let ȳ = (ytc , . . . , ytcurr ) be the portion
The discretisation using SAX is done online in a progressive
of kept data from either the start or the last change point (ytc )
way. That is, the SAX representation is recomputed over
until the most recent sample (ytcurr ). The filter computes the
the historic data as new instances come within the training
filtered values ȳ = f (ȳ) over a moving window of size n
window.
(n = 100 in the experiments) from ȳ1 until ȳk , where f is the
filter function and k = tcurr −tc . Each consecutive block of m
samples in ȳ is aggregated to one value, ȳ = g(ȳ ). In the last Change detection. Change detection in time series has been a
preprocessing step, this data is discretised using SAX [8] into topic of interests in different domains. Existing approaches can
a discrete time series from 1 to 5, SAX(ȳ ). The levels can be be divided into two broad groups of techniques. Techniques
interpreted as being levels of stress (1: completely relaxed and from the first group are based on monitoring the evolution of
5: maximum arousal). However, they should not be interpreted performance indicators like classification model accuracy or
as absolute levels of arousal, but rather as a local relative some property of the data. Cumulative Sum (CUSUM), intro-
measure of arousal. Please, notice that discretisation of the duced in [11] and recently used in [17] is one of the statistical
time series does not lead to an easy identification of the change process monitoring mechanisms. This method monitors the
points (see Figure 13 for an illustrative example. However, the mean of the input data (that can be also any filter residual) and
dicretisation can help the change detector to be more accurate. gives an alarm when it is significantly different from zero, i.e.
The signals are measured with a sampling frequency of deviates from the normal process behaviour. Other methods
4 Herz, yet it does not make sense to expect the stress rely on time series forecasting techniques such as Neural
detection to have timing requirements in the order of tenths of Networks and Auto Regression functions [15] that estimate
seconds. For this reason, we aggregate the data to the order of parameter changes online based on an offline mapping.
minutes. We use m = 240 in the experiments, thus after the Techniques from the second group are based on monitor-
aggregation step 1 sample point ȳi corresponds to 1 minute. ing distributions on two different time-windows: a reference
In the experiments, we took ȳi = max(ȳblock
i
). As said, in window summarizing past information and a window over
577
the most recent examples. Statistical tests based on Chernoff TABLE I
DATA SET SUMMARY.
bound, which decide whether samples drawn from two proba-
Number of users 5
bility distributions are different, were studied in [6]. ADaptive Number of time series 72
WINdowing (ADWIN) [2] that we use in our experimental Time series per user (mean) 14
study keeps a variable-length window of recently seen data Mean length (samples) 98721
Number of change points overall 368
points. It tries to keep the window of the maximal length that Mean change points per series 6.5
is still statistically consistent with the hypothesis that there has
been no change in the mean signal value inside the window.
Thus, we consider two different approaches for change
detection. Both approaches are aimed at finding statistically III. E XPERIMENTAL STUDY
significant changes in data. The first approach that we call In this section we present the results from the conducted
here Fit is based on monitoring the model error, and the second experimental study on real GSR data collected during the
approach ADWIN is based on monitoring the data signal itself. recent pilot field study. First, we give a concise description of
Both approaches were recently used for change detection in the the constructed dataset and experiment setup, and then provide
task of online prediction of the fuel mass flow in a boiler [13]. a summary of the quantitative evaluation and some highlights
Fit: Performance monitoring-based change detection with of the qualitative analysis of interesting cases.
the non-parametric test. In this study we assume that the
general pattern of arousal resembles the curve as shown in Dataset description. Table I summarizes the main charac-
Figure 3. We also assume that there is no global model that teristics of the data set. The data consists of the GSR data
predicts the general GSR signal for a person. Instead of using a measured on five persons in the course of the four weeks.
global model in combination with statistical change detection The data was collected from a watch-like device worn by
methods, we opt for a method that computes local models. the persons during working hours. Since the sampling rate
If we assume that the stress level of a person is stable in be- is 4 Hz and the typical working day is roughly 8 hours, the
tween changes, the changes can be detected by monitoring the average length of the raw time series is 98721. All together
error of a locally fitted model. Given historic (preprocessed) the data set contained 72 time series. 26 time series were
data, the objective is to fit a simple regression model.Based on excluded from the experiments for either of the two reasons:
the observed Mean Squared Error for the incoming points, we the GSR level showed very low variation or the contact of
can apply a statistic measure (e.g. Mann Whitney U test [9]) the sensors was not sufficient to yield a usable signal (these
to determine whether a significant change in the prediction were detected automatically by a filter and then verified by
error has occurred. the visual inspection).
Every time a new point arrives, the data is split into two sets. For each of the remaining 56 time series we annotated the
The first set is a reference set that excludes the new point. The change points based on the visual inspection. Overall the set
second set is a test set that includes the new point. For each of of time series contains 368 change points with an average of
the two sets a model is trained while iteratively leaving out one around 6.5 change points per time series.
of the points. When there is an overall significant difference The users participated in the study were instructed to anno-
between the two sets, it is considered to be a change point tate any meeting in their agenda (MS Outlook Calendar) with
and a cut is made. information about their feeling towards the meeting (“nice”,
ADWIN: Change detection based on raw data using adap- “exciting”, “neutral”, “annoying”, or “tense”). Although this
tive windowing. ADWIN method works as follows: given a information was available, it was not used in this investigation.
sequence of signals it checks whether there are statistically The reason for this is that the primary objective in this work
significant differences between the means of each possible split is to detect GSR peaks; however, a lot of the peaks do not
of the sequence. If a statistically significant difference is found, correspond to any meeting recorded in the agenda. Moreover,
the oldest portion of the data backwards from the detected the actual stress related to a meeting does not necessarily
point is dropped and the splitting procedure is repeated until shows up at the time of the meeting. It might precede the event
there are no significant differences in any possible split of (see Figure 7) or continue to influence the person afterwards
the sequence. More formally, given the GSR data stream, (see Figure 8). In the ideal case, these labels reflect the state
suppose a1 and a2 are the means of the two subsequences transitions as shown in Figure 4, but in reality it is hard to
as a result of a split. Then the criterion for a change detection discern the separate state changes.
is |a1 − a2 | > cut , where Therefore, instead of using the working agenda annotations
provided by the users, we used manually added labels based
1 4k 1 on the visual inspection of the GSR time series. In the
cut = log , a = 1 1 , (1) experimental study presented in this paper, we labeled only the
2a δ k1 + k2 change points, i.e. from the problem formulation perspective,
each point is labeled to be either a change point or not – that
here k is total size of the sequence, while k1 and k2 are sizes
our arousal detection approach will try to detect based on the
of the subsequences respectively.
already observed GSR values.
578
TABLE II 3.5
GSR
TP AND FP RATES OF DETECTING THE CHANGE POINTS . T HE MEAN μ 3 label
VALUES ARE PERCENTAGES WITH RESPECT TO PERFECT DETECTION . 2.5 fit
ADWIN
μ( TPP ) σ( TPP ) μ( T PF+F
P
) σ( T PF+F
P
) 2
GSR
P P
Fit 0.66 0.16 1.66 0.16 1.5
ADWIN 0.08 0.01 1.01 0.1 1
0.5
0
0 50 100 150 200 250 300 350 400
TABLE III Time, hours
T HE DISTANCE BETWEEN THE TIME OF THE ACTUAL CHANGE (ta ) AND
THE TIME OF THE DETECTION (td ).
Fig. 14. A flat signal followed by a high peak. On the down-curve of the
μ(|ta − td |) σ(|ta − td |) high peak there are many smaller peaks that are more difficult to detect.
Fit 2.8 0.54
ADWIN 2.5 1.2
10
GSR
8 label
fit
6 ADWIN
Experiment setup and evaluation. On each of 56 time series
GSR
we perform three steps: preprocess the data as discussed in the 4
0.6
There are two reasons why the True Positive rate is low for
0.4
ADWIN. The first is that it does not detect small peaks. The 0.2
second is that it also does not detect the change in cases where 0
0 50 100 150 200 250 300 350 400
the signal is slowly rising or falling (like in Figure 17). Time, minutes
579
GSR statistics from the calendar, e-mail correspondence and social
label
8
ADWIN fit media [18].
ADWIN An additional source of information is the similarities or
6
differences between persons. Each person will handle stress
GSR
580