Chapter
Chapter
1 Introduction
The context and driving forces for the research done for this thesis are explained in this
chapter.
It outlines the primary goals of the study and emphasizes the significant contributions
made by the suggested methodologies.
This chapter also includes the general thesis framework outline and the related
publications.
Several health-related problems can be easily detected when automated diagnostic techniques
are integrated into health monitoring equipment. In order to facilitate effective detection of
possible abnormalities, an algorithm should be developed for automated analysis and
classification purposes. The main goal of the thesis is to provide algorithms for the efficient pre-
processing, feature extraction, and machine learning-based analysis of biological signals, such
as PPG and EEG. The algorithms are specifically made to work with the real time signals and
different standard databases which are available for public use.
Health related issues are alarmingly growing on in the recent era irrespective of the age, sex, geographical
location, food habit, life style, emotional complications and job profile of the individuals [1.1]. These are
one of the leading causes of deaths throughout the whole world and the current scenario demands for the
attention of the researchers to this domain. Expert medical advice states that prompt medical care and
early detection can significantly lower heart and brain related complications, which in turn lower the risk
of death [1.2]. The traditional methods of diagnosis depend on skilled medical professionals visually
examining the biological signals and the diagnostic reports. However, the enormous amount of data that
needs to be analyzed, the time required, the detection error, and the scarcity of medical experts limit the
application of this technique. More significantly, worse survival rates are experienced in underdeveloped
nations due to restricted access to clinical knowledge and expensive diagnostics, which frequently cause
treatment delays.
In recent times, there has been a notable focus on research for the creation of sophisticated personal
health monitoring devices that incorporate automated diagnostic procedures [1.3 - 1.6]. The devices'
affordability offers easy accessibility for the general public, while their portability allows regular health
monitoring even from the comfort of home without the need for specialized medical intervention [1.5,
1.6]. A sophisticated automated monitoring system like this can lower the risk of death by ensuring early
and reasonably priced diagnosis for a large population. Early detection of various abnormalities is
ensured by the automated signal analysis algorithms, which integrate the diagnostic intelligence required
for such devices. Hence, the major focus area of this thesis is in the development of automated analysis
techniques for prediction and classification of different abnormalities and symptoms.
Personal healthcare devices must use non-invasive, affordable, and readily acquirable biological signals
in place of sophisticated and expensive diagnostic techniques to make inferences about the subject‘s
health. The most favored methods for initial level diagnosis in such devices continue to be the
Photoplethysmogram (PPG), which records the optical changes in blood volume related to the heart's
pumping, Electroencephalogram (EEG), which records the electrical activity of the brain and
Electrocardiogram (ECG), which records the electrical activity of the heart [1.6].
The automated techniques developed so far, are mainly based on ECG signal due to the simplicity and
consistency in the signal morphology. However, signal acquisition set up is a bit challenging aspect due
to the placement of multiple electrodes on the subject. This is very often an uncomfortable condition for
the subjects and causes difficulty in recording procedures. This problem can be overcome by developing
automated diagnostic algorithms based on PPG signal instead of ECG signal. The advantage of PPG
signal lies in the operation of the sensor which can be fitted to the finger tip of a subject and does not
create any discomfort [1.7]. Moreover, a single sensor is sufficient to record the signal attributes and thus
simplifies the overall acquisition procedure.
Respiration signal is another very important biological signal which is not given importance till date to
that extent as compared to ECG and EEG signals. But this signal contain many vital signature properties
which can have important diagnostic utility towards detection and monitoring of several abnormalities
related to heart, lungs and other physical and mental conditions such as tension, stress, emotions and
many more [1.8], [1.9]. Respiration signal is a simple periodic signal which can be acquired using a
respiration sensor which should be fitted to the chest of a subject by means of a belt. Tight fitting of the
belt for proper contact of the sensor to the human chest is essential which sometimes, can cause a little bit
of discomfort for the subjects. This problem can be avoided by using a PPG sensor instead of respiration
sensor since the respiration signal remain embedded in the acquired PPG signals. This has the advantage
in acquisition process as mentioned above. The additional advantage lies in the use of single senor based
approach for the acquisition of both the signals which opens up the possibility of multimodal analysis of a
single signal.
EEG signal has its own significance which cannot be replaced by any other signal due to its origin and
impact on everyday life of human beings. Brain is the most complex organ of the human body and is
responsible for all voluntary and involuntary actions. EEG signals contains numerous signature
properties, out of which only a minor percentage has been explored till date. The major challenge lies in
the non stationarity of the signal and its combination of several sub band frequencies which change
almost every instant for any individual. The acquisition of EEG signal is the most challenging out of all
the biological signals due to the use of several electrodes which needs to be placed on the human scalp
following the electrode positions governed by ‘International 10-20 electrode placement system‘. Despite
the acquisition complexity, researchers are focusing on the analysis of EEG signal because it has the
potential of generating valuable symptoms of numerous mental and physical problems and associated
complications [1.10]. Present day research is going on vastly in the domain of emotion recognition,
mental stress detection and applications in Brain Computer Interfaces (BCI) [1.11].
Emotion is very complex mental state or process of human beings, which can reflect human perceptions
and attitudes and play an important role in the communication between people. The research of emotion
recognition has very important value in the application of human-computer interaction. If the human-
computer interaction system can quickly and accurately identify human emotions, the interaction process
will be more friendly and natural. In the present day scenario, the primary challenge is to provide access
to basic medical facilities to the low-income populations living in remote and rural areas with vastly
compromised medical infrastructure. Consequently, the development of different less complicated
surrogate techniques for accurate detection of health conditions at the preliminary stage and at an
affordable cost has become a global research priority.
The objective of the present thesis is to critically investigate the different statistical, time and frequency
domain properties of PPG and EEG signals in order to establish its diagnostic utility for the prediction of
different emotion recognition, their classification and to identify and predict mental stress conditions of
human beings. The analysis of different emotions and mental stress will be effective, if the analysis is
carried out using noninvasive signal acquisition. Hence, the present work is mainly focused on the
processing of PPG and EEG signals. Apart from these two signals, some additional signal such as
respiration signal is also very powerful since it carries critical information about the breathing rate, which
often gets affected due to stress or emotional state. Heart rate variability can also be a valuable source of
information for predicting stress condition and so ECG can also be a vital signal for the present study.
The primary aspect of any analysis of the biomedical signal is based on pre-processing of the acquired
signal, followed by feature extraction and finally classification based on the extracted features. The
resulting features are then utilized via some standard techniques for the assessment of different diseases
or extraction of some vital parameters. Therefore, in the present research, the whole endeavor has been
divided into four parts as follows:
Photoplethysmogram (PPG)
Electroencephalogram (EEG)
PPG is the optical technique of measuring the blood volume changes in the tissues due
to cardiac cycle.
EEG is the electrical activity related to the synchronous activity of the neurons inside
the human brain.
The automated biomedical signal analysis and processing systems utilize the different signals
generated by the human body in order to diagnose the physical and mental state of an
individual. In order to provide patient comfort and mobility at reasonable costs, these systems
need to employ low-cost and non-intrusive signal acquisition methods. Photoplethysmogram,
Electroencephalogram and Electrocardiogram are the most widely used physiological signals.
The purpose of this chapter is to give a quick summary of the origin and importance of the
biomedical signals along with an explanation of these signals, their physiological origin, and
measuring conventions.
Because of the fact that the photoplethysmogram (PPG) signal is non-invasive and reasonably
priced, it is frequently utilized in clinical and consumer devices [2.1]. Its main applications in
the past have been in the measurement of blood oxygen saturation and the monitoring of heart
rate in patients who are at rest. PPG signal processing has been used in therapeutic settings for
many years, but it is now a huge and expanding field of study. The growing usage of PPG
sensors in consumer wearables has spurred the research in this domain. The design of signal
processing algorithms faces a number of difficulties in this environment, including the
challenge of managing motion artifacts. Furthermore, the PPG signal, which is not now
commonly utilized, contains important information about the respiratory, cardiovascular, and
autonomic nervous systems. All of these elements work together to make it possible to utilize
the PPG alone and comprehensively offer health information in everyday situations. The
creation of reliable PPG signal processing algorithms is a crucial first step towards taking use of
this possibility.
The human brain begins to fire neurally between the seventeenth and twenty-third week of
pregnancy. It is thought that electrical impulses produced by the brain at this early age and
throughout life reflect not just the health of the brain but also the state of the entire body. This
presumption serves as the driving force for the application of sophisticated digital signal
processing techniques to electroencephalogram (EEG) signals obtained from human subjects'
brains. Although the authors make no attempt to discuss the physiological components of brain
activity, there are a number of questions that need to be addressed regarding the nature of the
original sources, their real patterns, and the properties of the medium. The medium delineates
the trajectory from the neurons, which function as signal sources, to the electrodes, which are
sensors that measure combinations of the sources. On the other hand, for those who work with
these signals for the detection, diagnosis, and treatment of brain disorders and the related
diseases, an understanding of neuronal functions and neurophysiological properties of the brain
as well as the mechanisms underlying the generation of signals and their recordings is essential.
2.2.1 EEG Signal
EEG signal represents the currents that flow during synaptic excitations of the dendrites of
many pyramidal neurons in the cerebral cortex. The dendrites of brain cells (neurons) produce
synaptic currents when they are triggered. The summed postsynaptic graded potentials from
pyramidal cells, which form electrical dipoles between the soma (the body of a neuron) and the
apical dendrites, are the cause of differences in electrical potentials (Figure 2.6). The positive
ions sodium (Na+), potassium (K+), calcium (Ca++), and the negative ion, chlorine (Cl-), are
pumped through the neuron membranes in the direction determined by the membrane potential
to produce the majority of the current in the brain [2.20].
The human skull, brain, scalp, and other thin layers in between make up the various layers that
make up the human head. Signals are attenuated by the skull about a hundred times more than
by the soft tissue. However, the majority of noise is produced either above the scalp (also
known as external noise or system noise) or inside the brain (also known as internal noise).
Consequently, the scalp electrodes can only record enough potential from huge populations of
activated neurons. For display reasons, these impulses are then substantially magnified. The
central nervous system (CNS) develops about 1011 neurons at birth when it is fully functional
[2.21]. This results in 104 neurons on average per cubic millimeter. Synapses allow neurons to
link with one another to form neural networks.
An adult's synapses are roughly 5 × 1014. While the number of neurons declines with age, the
number of synapses per neuron grows. The brain can be categorized into three regions based on
its anatomy: the brainstem, cerebellum, and cerebrum (Figure 2.7). The cerebral cortex, which
is made up of the left and right lobes of the brain, is a highly convoluted layer on the surface of
the brain. The areas of the brain responsible for initiating movement, cognitive awareness of
sensation, complex processing, and emotional and behavioral expression are all located in this
region. Balance is preserved and voluntary muscle actions are coordinated by the cerebellum.
Involuntary processes like breathing, heart rate regulation, biorhythms, and neuro-hormone and
hormone sections are all managed by the brain stem [2.22]
It is evident from the section above that the study of EEG opens the door to the identification of
a wide range of neurological conditions as well as other anomalies in the human body. The
following clinical issues can be investigated using the obtained EEG signal from humans (and
also from animals) [2.22, 2.23]:
This above compilation demonstrates the great potential of EEG analysis and highlights the
necessity for sophisticated signal processing methods to support the physician in interpreting the
results. The descriptions of the EEG rhythms, which should be detectable in EEG recordings, is
mentioned below.
The frequency range of delta waves is 0.5–4 Hz. These waves can occur during awake hours
and are generally linked to profound slumber. It is very easy to mistake the real delta response
with artifact signals coming from the big muscles in the mouth and neck. This is due to the fact
that the muscles are located close to the skin's surface and generate strong signals, but the signal
of interest comes from deep within the brain and is much reduced as it passes through the skull.
Nevertheless, it is rather straightforward to determine when the response is brought on by
excessive movement by using basic signal analysis techniques on the EEG.
Theta waves occur between 4 and 8 Hz. The word "theta" may have been chosen to suggest that
it originated in the thalamus. As slumber begins to fade into consciousness, theta waves
manifest. Deep concentration, creative inspiration, and the ability to access unconscious content
have all been linked to theta waves. A theta wave appears to be connected to the degree of
arousal and is frequently accompanied by other frequencies. It is well known that the alpha
wave of skilled mediators eventually decreases in frequency over extended periods of time.
Theta waves are significant during the early years of life. Greater theta wave activity in awake
adults are pathological and result from a variety of issues. Studies on emotions and maturation
look at the variations in theta wave rhythm [2.25].
Alpha waves are often located above the occipital area of the brain and manifest in the posterior
half of the head. They are present throughout the entire brain's posterior lobes. The frequency
range for alpha waves is 8–13 Hz, and they often manifest as a round or sinusoidal waveform.
On the other hand, it might occasionally show up as jagged waves. It has been suggested that
alpha waves signify a state of relaxed awareness devoid of any focus or attention. The alpha
wave, which may span a wider range than previously thought, is the most noticeable rhythm in
the entire field of brain activity. Even up to 20 Hz, there is a peak that is frequently observed in
the beta wave range that resembles an alpha wave condition rather than a beta wave. Once
more, an alpha setting emerges around 75 Hz, which is where a response is frequently observed.
It has been suggested that alpha waves are only a waiting or scanning pattern generated by the
visual regions of the brain because most persons produce some alpha waves while their eyes are
closed. It is lessened or gone when one opens their eyes, when one hears strange noises, when
one feels anxious, or when one focuses their mental attention. The amplitude of an alpha wave
is typically less than 50 μV and is higher over the occipital areas. More investigation is needed
to determine how an alpha wave forms in cortical cells, as the physiological relevance and
genesis of this phenomena remain unclear [2.26].
The electrical activity of the brain that varies between 14 and 26 Hz is known as a beta wave
(although other literature does not specify an upper bound). In healthy people, a beta wave is the
typical waking rhythm of the brain linked to active thinking, active attention, external focus,
and problem-solving. A person experiencing panic may acquire a high-level beta wave. Beta
activity that is rhythmic is primarily observed over the frontal and central regions. Significantly,
a motor action or tactile stimulation can block a central beta rhythm, which is associated with
the rolandic mu rhythm. Typically, the beta rhythm amplitude is less than 30 μV. Like the mu
rhythm, bone defects and areas surrounding tumors can also cause the beta wave to be
amplified.
The range of frequencies over 30 Hz, primarily up to 45 Hz, is known as the gamma range, or
rapid beta wave. These rhythms have extremely small amplitudes and are not frequently
observed, but their detection can be used to establish the existence of specific brain illnesses.
The frontal and central area contains the regions with the highest cerebral blood flow, oxygen
and glucose consumption, and high EEG frequencies. The locus for movement of the right and
left index fingers, the right toes, and the large and bilateral area for movement of the tongue can
all be seen using the gamma wave band, which has also been shown to be a useful indicator of
event-related synchronization (ERS) of the brain [2.27].
The normal brain rhythms and their typical amplitude levels are depicted in Figure 2.8.
Generally speaking, leptomeninges, cerebrospinal fluid, dura matter, bone, galea, and the scalp
attenuate brain activities, which project into EEG signals. The amplitudes of cartographic
discharges range from 0.5 to 1.5 mV, with spikes reaching several millivolts. On the scalp,
however, the amplitudes often fall between 10 and 100 μV. These rhythms are roughly cyclic in
nature since they could persist if the subject's condition does not alter.
This chapter gives a brief detailing about the different processing techniques which are
normally adopted for the elimination of artifacts from biomedical signals along with the
conventional signal processing methods available in the literature.
There are several artifacts which are normally picked up by the acquisition system
during the recording of biomedical signals.
These artifacts are a challenge to the researchers as they remain embedded in the
signals and their amplitude and frequency components are overlapping with that of
original signal.
This chapter presents the conventional signal processing techniques which are used for
eliminating different artifacts from the acquired signals.
Once the artifacts are eliminated, certain features are required to be extracted from the
clean signals using different signal processing steps.
A detail review of the signal processing techniques used for feature extraction is
provided in this chapter.
3.1 Review of Artifact Removal from PPG Signal
PPG signal processing is currently a wide field of research [2.9], [3.1], [3.2] since PPG sensors are
widely used in both clinical and consumer devices. Access to this field is ensured by the availability of
many publicly available datasets comprising of PPG signals along with reference parameters. An
overview of the PPG signal processing techniques is given in this section.
The design method determines the coefficients based on the required cut-off frequencies and the kind of
filter [3.3]. In order to produce the filtered signal, y, this transfer function can also be represented as a
difference equation that can be simply applied to the original signal in the time domain. The expression
for the difference equation is:
Finding the b[m] and a[n] coefficients that provide the appropriate filter response is the critical step in the
design process of a digital filter [3.3]. Because digital filters can significantly affect the morphology of
PPG signals, it is crucial to design them with the final use in mind [3.4].
The two main families of digital filters are Infinite Impulse Response (IIR) and Finite Impulse Response
(FIR) filters which differ according to their transfer function. A filter can be low-pass (LPF), high-pass
(HPF), band-pass (BPF), or band-stop (BSF) with a specific type, order, and cut-off frequency [3.3]. The
behavior of the slope of transitions between the pass bands and the reject bands, or vice versa, is
determined by the sequence. In addition to requirement of a longer input signal time and producing a
bigger delay in the filtered signal, higher order filters have steeper transitions. Band-pass and band-stop
filters have two transition bands and two different cut-off frequencies, respectively, while low-pass and
high-pass filters only have one transition and one cut-off frequency.
The Moving Average (MA) filter is the most often used FIR filter in the field of biomedical signal
processing, including pre-processing the PPG signal [3.1]. Other widely used FIR filters include the
median filter, which calculates the filter's result based on the median value of the most recent n samples
rather than the mean value of the n samples, and FIR filters that employ Hamming windows, which can
be created for any kind of filter that is required. Analog filters were originally used to design IIR filters
[3.3]. The Butterworth filter, type I and II Chebyshev filters, and elliptic filters are the most often used
IIR filters [3.1]. The frequency response of the filters shows the primary distinction between the design
techniques. In contrast to elliptic and Chebyshev filters, Butterworth filters often have a higher slope but
offer ripple-free pass and stop bands, which is typically required in biological signal analysis [3.3].
However, fourth-order Chebyshev type II filters have been proposed by Liang et al. as being more
effective in enhancing PPG signal quality, despite the lack of standards for PPG signal filtering [3.3].
Other sophisticated methods, including de-noising with the Wavelet transform, have been suggested as
filtering options [3.3], [3.4].
The delay that filtering technique introduces into the final signal is one of its fundamental characteristics.
When filtering the signal offline, this delay can be fixed, allowing the filter to be applied in both forward
and reverse directions while still allowing access to the whole signal. Real-time zero-phase filtering is not
achievable, though. As a result, the signal is always delayed by real-time filtering. The order of the filter
affects this delay. FIR filters typically impose longer delays because they require higher orders to achieve
similar results. However, in some real-time applications that can tolerate greater delays, FIR filters might
be chosen over IIR filters. Additionally, unlike IIR filters, which typically have non-linear phase, FIR
filters have linear phase, meaning that the delay is always linear and of a known value.
There are no established standards for figuring the PPG filter cut-off frequencies. The majority of the
PPG signal's frequency content is below 15 Hz, yet the application has a significant influence on the cut-
off frequencies chosen for PPG signal analysis. Cut-off frequency selection is frequently the result of
compromise in design; for example, a lower low-pass cut-off frequency may enable individual pulse
waves to be more easily recognized, but it may also deform their shape.
(b) Identification of the fiducial points on each pulse wave and its derivatives so that the
features of the pulse waves can be computed.
(c) Computation of the pulse wave characteristics, such as amplitude, time, and form, by
computing them.
Identifying individual pulse waves for study is the initial step in the time-domain analysis of the PPG.
This task is difficult due to the following reasons: (i) noise and low frequency physiological fluctuations
can induce perturbations in the signal; (ii) individual pulse waves can exhibit two different peaks,
especially in young healthy participants [3.82], [3.83]. Numerous techniques have been put forth to
distinguish between distinct pulse pulses. The majority of techniques rely on identifying the systolic peak
because it is typically the most noticeable characteristic [3.84]. Typically, methods involve four steps: (1)
signal filtering to highlight the desired PPG components; (2) pulse extraction; (3) peak or onset
identification; and (4) peak or onset correction [3.85].
Three popular methods for identifying peaks are as follows: (i) using thresholds to detect peaks [3.86],
(ii) identifying peaks as zero-crossing points on the first derivative and combining this with an adaptive
thresholding scheme [3.87], and (iii) identifying peaks using the slope function and combining it with
adaptive thresholding [3.88]. Additional techniques include: (i) locating the PPG signal's maxima and
minima points [3.82]; (ii) locating points on the PPG's first- [3.83], [3.89], and second-derivative [3.89];
(iii) locating the PPG's positive slopes [3.75], [3.89]; and employing more sophisticated methods like the
Wavelet transform [3.90], [3.91] and the local maxima scalogram [3.92].
(b) Identification of the fiducial points on each pulse wave and its derivatives so that the
features of the pulse waves can be computed.
Finding points of interest on each pulse wave is the second stage in time-domain analysis. Typically
referred to as fiducial points, these are discrete locations that are discernible on the pulse wave or its
derivatives. Fiducial points that are frequently used are shown in Figure 3.1. Finding points of interest on
each pulse wave is the second stage in time-domain analysis. Typically referred to as fiducial points,
these are discrete locations that are discernible on the pulse wave or its derivatives. Fiducial points that
are frequently used are shown in Figure 3.1. Referring to Figure 3.1(a), the dicrotic notch, the systolic
peaks, diastolic peaks, and the pulse onset are the important fiducial locations on the original PPG pulse
wave. Figure 3.1(b) displays the maximum point on the first derivative, which denotes the original
signal's maximum slope point. Four discrete systole points on the second derivative are recognized as the
a-, b-, c-, and d-waves (Figure 3.1(c)). The dicrotic notch's location can be ascertained using the e-wave
[3.82]. On the third derivative, the points p1 and p2 can be found (see Figure 3.1(d)).
The study of the PPG signal's morphology using features taken from the pulse wave form and its
derivatives is known as pulse wave analysis. Many morphological characteristics that are mostly derived
from the amplitude and width of the PPG wave contour have been studied in the literature. These
features measure the properties of pulse waves to gather data on the state of the cardiovascular system.
The systolic amplitude or the height of the PPG from the baseline to its peak, on the initial pulse wave
has been linked to stroke volume [3.81] and has also been proposed as a blood pressure measure [3.93].
The ratio of pulse area [3.94] and the width of the PPG at half its amplitude [3.95] are two possible
markers of total peripheral resistance.
Cardiovascular disorders have been distinguished using the crest time (CT) [3.96]. Blood pressure
indicators have been presented, including diastolic time, CT, pulse width from the systolic and diastolic
sections of the pulse wave, and their ratios [3.97], [3.98]. A correlation between the brachial-ankle pulse
wave velocity and the reflection index has been proposed [3.99]. It has been discovered that age and
mean arterial blood pressure are correlated with the major artery stiffness index [3.100]. Pulse wave
analysis commonly makes use of the amplitude ratios of the b, c, d, and e waves with respect to the a-
wave on the second derivative. It has been discovered that a number of characteristics derived from the
second derivative are helpful markers for evaluating cardiovascular health.
For example, the c/a, d/a, and e/a indices are believed to decrease with age, whereas the b/a index
increases. These indices are supposed to indicate greater arterial stiffness [3.78], [3.101]. Moreover, it
has been discovered that the b/a and c/a ratios are helpful in differentiating between hypertension
patients and healthy controls [3.102]. Furthermore, b/a ratio have been proposed as a helpful indicator of
atherosclerosis and changed arterial distensibility [3.103]. Other aspects, including the aging index given
as (b-c-d-e)/a, have also been proposed as derived from the second derivative [3.78]. A different aging
index, (b-e)/a, was suggested in [3.101] for the cases where the c and d-waves are not evident.
Through a technique called "pulse decomposition analysis," the PPG pulse wave can be broken down
into incident and reflected waves. This makes it possible to extract features from the separate waves that
comprise the pulse wave as a whole. Assuming that this incident wave is mostly responsible for the
systolic up slope, allows for the extraction of an incident wave. To create a symmetrical wave, the
systolic upslope is flipped horizontally. This symmetrical wave is then either fitted with a Gaussian to
mimic the incident wave [3.104]–[3.106], or it is considered to be the incident wave [3.107].The initial
pulse wave is then subtracted from this incident wave, leaving a residual part that is primarily composed
of one or more reflected waves. The next step is to extract these reflected waves by modeling them as
Gaussian curves or symmetrical waves. From the timings and amplitudes of the individual wave peaks,
the widths of the individual waves, and the time intervals between their peaks, pulse wave features can
then be obtained [3.3].
Fourier transform is widely used to calculate frequency spectra of PPG signals. The Fourier transform
maps the signal's information to the frequency domain using sinusoidal waves [3.108]. Three parameters
can be used to describe sine waves: phase, frequency, and amplitude. As such, they are useful for
determining a spectrum's phase and magnitude components. These elements are shown for a clean PPG
signal in Figure 3.2. The Fast Fourier transform (FFT) algorithm was utilized to acquire these results, as
it was designed to apply the Fourier transform discretely and with reduced processing costs [3.108]. The
majority of the data in the PPG is below 15 Hz, and the heart rate is represented as a significant peak in
the spectrum at 1 Hz.
Spectral analyses are influenced by the PPG recording's duration and sampling frequency. First, the
maximum frequency accessible in the frequency spectrum (fs/2) is determined by the sampling
frequency (fs). As a result, the sampling frequency ought to be greater than twice the highest frequency
which is relevant. Second, the length of the recording affects the resolution of the resulting frequency
spectrum, which is determined by fs and the quantity of data points utilized in the spectrum calculation.
Zero-padding the signal [3.108] can enhance the number of data points in situations where a better
resolution is required, such as when evaluating autonomic nervous system activity, which often occurs in
very low frequency regions from about 0.04 to 0.4Hz [3.110].
If only the magnitude component is needed, then other methods can be applied to obtain the frequency
spectrum. Welch's periodogram approach and power spectral density (PSD), often known as power
spectrum are frequently applied. The computation of the Fourier transform power is the foundation of
the PSD. The foundation of Welch's periodogram is the segmentation of the PPG signal, the acquisition
of a PSD for each segment, and the ultimate production of a spectrum by averaging the spectra [3.108].
In certain applications where the signal's behavior is well understood, modern methods for obtaining
spectral information can be advantageous since they try to lessen the impact of noise on the resulting
spectrum [3.108]. Certain techniques rely on moving average models (MA), autoregressive models
(AR), or an amalgamation of both (ARMA). The power spectrum can be estimated using AR models in a
variety of ways, including the Yule-Walker, Burg, covariance, and modified covariance methods
[3.108]. Other contemporary non-parametric methods of spectrum analysis are predicated on Eigen
analysis frequency estimation, which divides the signal using singular value decomposition (SVD) into
correlated and uncorrelated signal components [3.108]. The multiple signal classification (MUSIC)
algorithm is one such example [3.108]. The application, the signals that are available, and the data that
has to be extracted from the spectra should all be taken into consideration when selecting the techniques
and parameters for obtaining a frequency spectrum.
There hasn't been much study done up to this point that makes use of the PPG signal's potential for
accurate emotional state recognition. To classify emotions based on time plane and spectral features with
lower accuracy, most reported researches, however, use the PPG signal in conjunction with other
modalities such as galvanic skin response (GSR) [3.123], electromyography (EMG), respiration rate
(RSP), and EEG. [3.124] uses a complex classifier to examine the spectral power feature of four distinct
EEG bands in order to study the valence and arousal types of emotions. Park et al. [3.125] classifies
happiness and sadness using time-domain PPG characteristics and skin temperature. The authors of
[3.126] attempted to categorize thirteen emotions using a feature fusion and wavelet-based approach.
According to [3.127], the setup for the identification of emotional states combines a complex collection
of classifiers with statistical features taken from the ECG, GSR, and PPG signals. In [3.128], PPG and
GSR signals are combined and used through two classifiers to identify eight distinct emotional states with
a maximum average accuracy of 92%. Goshvarpour et al. [3.130] used descending accuracy analysis of
the Poincare's indices produced from the PPG signals in an attempt to classify only three unique
emotions. In addition to the multimodal use of the PPG signal [3.123 – 3.128], most of the
aforementioned methods either identify a small number of emotional states [3.123 – 3.125], [3.127],
[3.129], or use sophisticated bio-signal properties [3.123 – 3.124], [3.126], [3.127], [3.129], or
complicated classification algorithms [3.123 – 3.129]. However, it is still discovered that there is a lack of
exact identification of emotion recognition when using alone the PPG waveform features, despite the fact
that they have sufficient promise.
The clinical significance of PPG signal is justified from the wide variety of applications
as found in the literature.
4.1 Overview
Emotion is a complex mental state that often mirrors attitudes and views in people. Accurate
identification of emotional states and their characteristics is essential for diagnosing serious illnesses and
designing appropriate treatment plans. EEG-based analysis, which is multi-lead and complex, is typically
used to characterize emotion detection. These days, the photoplethysmogram (PPG) signal's wearable
features, comprehensive cardiac information, and ease of usage are being utilized to determine emotional
states. However, PPG signals are mostly used in multimodality approaches by most of the documented
emotion detection systems. This chapter presents a straightforward methodology that uses only the PPG
signal analysis to identify multiple emotional states. The blood ejection rate typically varies in response
to emotion-induced heart rate changes, which in turn creates a departure in the systolic and diastolic
phase balance. As a result, a particular time-domain characteristic is found to measure this imbalance, and
its variability is then employed as a feature in a threshold-based classification method to distinguish
between the five most prevalent emotional states. With an average detection accuracy of 97.78%, the
algorithm performs better when tested on PPG data gathered from the standard DEAP dataset. The higher
accuracy values, when compared to previous literature, demonstrate the usefulness of the suggested
algorithm for the sole PPG signal-based detection of different emotional states. Its potential for usage in
practical healthcare applications is further supported by the utilization of a single PPG characteristic and
the adoption of a straightforward threshold-based categorization method.
Researchers nowadays are searching for various substitute methods to attain a less complicated yet
accurate emotional state detection along with implementation promises. Consequently, the PPG signal's
affordable, simple-to-implement, and easy-to-acquire qualities have long piqued researchers' curiosity.
The PPG signal's non-invasive mechanism primarily uses an electro-optical method to estimate the
relative blood volume change for each cardiac cycle. This mechanism can monitor the blood volume
change at various peripheral places on an individual, such as the tip of a finger, earlobe, toe, etc. [4.17].
An infrared source and detector assembly is used in the fundamental operation, where light from the
source travels through tissues and cells before hitting the detector. The amount of blood flowing during
each cardiac cycle determines how much light is partially absorbed. Numerous recent studies have
previously made use of the intrinsic cardiac and blood circulation-related facts encoded in the PPG signal
to extract a variety of cardiac parameters [4.18] and to identify a few crucial cardiac defects [4.19 – 4.21].
In reality, it is discovered that the autonomic nervous system (ANS) correlates human brain activity with
heart functionality. Consequently, pathophysiological fluctuations generated by emotions impact not only
heart rate variability (HRV) but also blood ejection rate [4.22]. Given that every PPG beat denotes the
total blood volume change connected to each cardiac cycle, it is reasonable to assume that any
fluctuations in the blood ejection rate brought on by emotion will also be represented in distinct PPG
signal waveform regions [4.18]. Because PPG signals are easy to acquire and have inherent clinical
significance, they may be a better option when it comes to emotion identification.
In the current chapter, a streamlined method for identifying various emotional states is developed, based
on an examination of the PPG signal properties. The placement of the dicrotic notch reveals a particular
time-domain characteristic, and the suggested algorithm makes use of this characteristic's variation to
reflect the modulation of the physiological changes associated with PPG waveform corresponding to five
distinct emotional states. It is noteworthy to emphasize that, to the best of our knowledge, accurate
detection of different emotional states using a single time-domain component of the PPG signal has not
yet been examined or published before this.
4.4 Methodology
The block schematic of Figure 4.2 presents the general methodological aspects of the present method for
PPG based identification of five distinct emotional states. The algorithm consists of four main steps: 1)
denoising of the PPG records obtained from the DEAP dataset; 2) precise identification of fiducial points
with minimal processing and amplitude normalization; 3) feature extraction; and 4) threshold-based
classification for the accurate identification of five emotional states.
The PPG beat's most critical part is considered to be the region between the onset and the dicrotic notch,
given the impact of emotion in peripheral blood volume change. Every PPG beat is discovered to induce
changes in the aforementioned area due to variations in heart rate and the systole-diastole portions of the
PPG beats. Also, the position and morphology of the dicrotic notch must withstand a beat-wise, subject-
wise, and emotion-wise variation since the effect of emotion is not static and the degree of emotion
continues to vary depending on the subject's physical or mental conditions as well as the level of external
stimuli. Ten PPG beats are therefore taken into consideration for each emotion and person in order to
depict this emotion-induced fluctuation. Figure 4.5 displays one such sample map of the area of interest
from the PPG beats. The PPG pulse's variability over the course of ten consecutive beats is then chosen as
the most discriminating and the only feature to provide the most profound outcome when categorizing the
five distinct emotional states on a group of subjects, as opposed to taking the selected area of the pulse
starting from the first onset to dicrotic notch. The feature for this study is the variance value for each
record of the summation of the indicated region, as shown in Figure 4.5.
4.4.6 Classification
According to the analysis of the extracted feature values from each subject, each emotional state under
consideration has a distinct range of feature values. This means that the five different types of emotional
states can be adequately classified using simplified threshold-based classification logic with an unique set
of decision boundary values. Two distinct categorization techniques, which are explained below, are used
to further authenticate the general discrimination logic that was established.
The benefit and efficiency of the selected characteristic come from its ability to employ a straightforward
threshold-based categorization strategy in place of any intricate method. The maximum, lowest, standard
deviation and average values of the acquired feature are now used to calculate unique threshold values for
each emotion. The test dataset is then subjected to these threshold values in order to categorize particular
emotions. T1, T2, T3, and T4 are the only four separate threshold values that are generated and used for
classification due to the large number of emotional states in the current classification problem. The
threshold rule for classification is created as follows, where x is the feature value for a particular subject's
emotion. The feature set value in question is compared to each of the four boundary values using a
threshold rule-based classification technique, starting with the smallest value and working up to the
highest. The feature value will match any one of the five emotion classes throughout this comparison
process. The class that is chosen as the predicted emotion class is the one for which the current feature
value is inside the boundary values.
It is evident that using a single unique feature value lessens the complexity of classification. Furthermore,
it should be noted that the threshold-based technique, which classifies five emotional states based on a
single feature, is exclusive in nature and has not, as far as we know, been discussed before. Due to the use
of only four distinct threshold values, the overall computational time is negligible even though five
conditions must be met for classification purposes. These threshold values form the basis of the entire
classification; therefore iteration is not necessary to achieve optimal results.
Following classification into five distinct emotional states, each emotional state's efficiency in relation to
the other four emotions is evaluated in further detail for each of the computed threshold values. One
emotion is considered to be one class in each scenario, while the remaining feelings are considered to be
the other class. For instance, in the first scenario, the emotion of love is classified as one class, and all
other emotions are classified as a second class. A decision boundary is selected, and the threshold-based
rule is used to classify the data. Here is an illustration of a binary classification issue. An emotion will be
classified by the model as belonging to the Love category or to another (non-Love) category.
Thus, it makes sense to find that the variance of the area values between a PPG signal's onset and dicrotic
notch would have distinct groups for various emotional states based on the previously explained
background. Figure 4.6 depicts the box plot showing how the range of values for different emotions
varies noticeably. According to the classification of emotions, the valence emotion has the lowest range
of values and the dominant emotion has the biggest range of variance values. The variance values of the
other emotions lie between the ranges of positive valence and dominance.
Prior to classifying the emotional states, feature trend analysis verification is done to ascertain the
feature's trend of variation. This is regarded as a crucial component of any technique for recognizing
emotions since the trend of the extracted characteristic must be comparable across subjects in different
databases. While it is true that every emotion has a different effect on various subjects, low and high
arousal emotions as well as positive and negative valence emotions should exhibit a consistent pattern of
change. Figure 4.7 displays a typical plot of the feature values from ten randomly selected subjects,
demonstrating a consistent trend throughout the patients. The feature's values rose from positive valence
to strong emotional arousal.
Generally speaking, the following common statistical parameters which are defined below—are used to
assess the effectiveness of the previously indicated classification technique:
Here, true positive, true negative, false positive, and false negative are denoted by the letters TP, TN, FP,
and FN, respectively. The positive class that has been accurately classified as positive is represented by
TP, while the negative class that has been correctly classified as negative is represented by TN. In a
similar way, FN denotes the positive class that is mistakenly categorized as negative, and FP indicates the
negative class that has been mistakenly classified as positive.
In the current study on emotion categorization, thirty-two PPG records are taken into consideration, and
ten consecutive beats per record are chosen for feature extraction. Approximately 70:30 ratio is used to
partition the total dataset at the start of the classification procedure. This indicates that thirty percent of
the data is set aside for testing purposes and the remaining seventy percent is used for training.
Furthermore, a cross validation technique—eightfold in this case—is also used to reduce the risk of over
fitting, improve the trained model's accuracy for unseen feature values, and most importantly achieve
boundary values for the universal classification of every emotion. Now that the full data set has
undergone eight-fold cross validation, it has been originally divided into eight equal sets. Seven of these
eight data sets are used to create the classifier model and the remaining one data set is used for testing. To
cover all test sets, the entire process is repeated eight times. This indicates that every feature value
acquired is used in the classification process's training and testing phases. Now, unique threshold
boundaries (T1, T2, T3, and T4) are chosen for each test fold, and the related performance is noted.
Following eight-fold testing, the final decision boundary for the categorization of all the emotions is
determined by calculating the mean model boundary values obtained from the complete eight-fold test
data. Figure 4.8 displays the final modeled classifier boundaries using a threshold-based rule for the five
chosen emotions. Additionally, Table 4.2 presents the results of evaluating the constructed classifier's
improved performance for multi-class classification of all the emotions under consideration using the
parameters indicated in Equations (4.2–4.5).
The whole dataset is split into two uneven halves before any one emotion is binary classified in relation
to all the other emotions. First part contains the number of datasets related to a particular emotion (32×1 =
32), and the second part contains the remaining datasets derived from four emotional states (32×4 = 128).
These two datasets are split up into four more sets, three of which are used for training and the fourth set,
is used for testing. In order to use each subset as a test set and take into account the average classification
border, the classification process is performed four times. This process is repeated for each of the five
binary classification models, and the results are displayed in Figure 4.9 for the computation of the four
distinct classification borders (B1, B2, B3, and B4). In this instance, the binary class and multi class
classifications have the identical classification boundary values under different labels. The classification
borders between "love" and "the rest of the emotions" is B1, respectively, and between "calm" and "the
rest of the emotions" are B1 and B2. The barrier between "Happy" and "Rest of the emotions" is formed
by B2 and B3, and the boundary between "Sad" and "Rest of the emotions" is marked by B3 and B4.
Finally, as seen in Figure 4.9, B4 delineates the boundary between "Hate" and "Rest of the emotions."
Using the parameters from Eq. (4.2–4.5), Table 4.3 displays the improved performance of the binary
classification strategy in this instance as well.
The number of true love emotional states that are accurately classified as "Love" emotion is represented
by TP, for instance, if we regard "Love" to be one class and the other as another. The other emotions that
are accurately classified as other emotional states (not in the "Love" category) are denoted by TN. The
other emotions that are mistakenly classified as "Love" are represented by FP. Likewise, FN stands for
"Love" feelings that are mistakenly classified as other emotions (i.e., not falling under the "Love" group).
Figure 4.10 shows the representative confusion matrix for the threshold-based binary class classification,
which assigns Love to one class and the other emotions to another.
The ROC curves for each of the classification methods provide additional evidence of the efficacy of the
procedure, and the area under the curve (AUC) value is calculated in each case. The results show that the
AUC values for the binary classification technique fall between 0.998 and 1, and for the multiclass
classification technique, they fall between 0.994 and 1. The effectiveness of the straightforward
threshold-based classifier is justified by the high AUC values shown in the ROC plots. The subsequent
Figure 4.11 and Figure 4.12 display the ROC curves and related AUC for binary class classification and
multi class classification, respectively.
It is noteworthy to emphasize that there are still very few studies that use PPG signal analysis alone to
detect numerous emotional states. Conversely, most of the approaches that have been suggested use PPG
signals in multimodality mode [4.28 – 4.33]. The drawback of multimodality mode is that it increases the
computing overhead of the algorithm by requiring distinct denoising and processing approaches and
extracting multiple features from various inputs. Different time-domain, frequency-domain, spectral
features, spectral power asymmetry, statistical, Poincare, or wavelet-based complex features are used by
the multimodality approaches as indicated in [4.28 – 4.34]. Moreover, a number of intricate classification
methods are employed to classify the emotional states, including Bayesian network, ridge regression,
Naïve Bayes method, Support Vector Machine (SVM), Multilayer Perception (MLP), k-nearest
neighbour (kNN), Meta-multiclass (MMC), and Random forest [4.28 – 4.33]. Finally, when evaluating
across the identical dataset as used in this chapter, most methodologies, although using several bio-
signals, identify only a restricted number of emotional states [4.28 - 4.30], [4.32], [4.34] with poor
accuracy. Table 4.4 presents a comparative performance evaluation of all the previously discussed
methods with the suggested work. Now, the evaluation result makes it clear that, despite significant
methodological differences, the proposed method outperforms the literature mentioned in it by classifying
five prominent emotional states using a single, easily extracted feature and a straightforward, easy to
implement threshold-based classification methodology.
4.7 Discussion
Researchers today are interested in the proper recognition of human emotional states by automated
systems, and many different physiological signals have been explored in this regard. A number of
additional signals, including the gold standard EEG signal as well as the ECG, EMG, GSR, and PPG,
have also been investigated for assessing an individual's emotional state. PPG, among other bio-signals,
has drawn particular interest from the scientific community because of its straightforward wave form,
inexpensive equipment, easier capture, integration of trustworthy pathophysiological data, and operator-
free processes. It is still uncommon to find a straightforward and effective method for recognizing
emotions using just the PPG signal. As a result, the current study proposes a straightforward, effective,
and manageable algorithm for the automatic detection of five different emotional states using only the
PPG signal. The five different emotional states are primarily classified using an easy-to-implement
threshold-based classification algorithm and an easy-to-extract feature. Since the emotion effect is
discovered to be at its peak within the first few seconds of the videos, ten consecutive PPG beats are
taken into account when computing the feature values. While a greater number of PPG beats may add to
the computational load, fewer PPG beats may not be able to adequately capture the emotional shifts. It is
noteworthy to observe that the suggested approach recognizes five key emotional states using just a single
feature.
On the other hand, PPG beats with sudden morphological variations frequently pose significant
challenges in identifying this moment. Future research on the PPG signal waveform should focus on
finding more pertinent elements that might be used in conjunction with the suggested algorithm to
recognize a wider range of emotional states. Because of the PPG signal's inherent physiological
properties, it is used to compare various bio-signals and techniques, including speech and facial emotions.
The subject has the ability to control their voice and facial emotions, and they can even suppress their
facial expressions to a certain level, which may result in faculty conclusions. However, because a patient
cannot willfully change the PPG signal, it can be used as a key signal in emotion recognition techniques.
Russell's circumplex model is a commonly used emotional model that classifies various emotions based
on their valence, arousal, and dominance scale. The aforementioned feelings are intricate and frequently
overlap. For instance, it is difficult to distinguish between the feelings "relaxed" and "calm," while "sad"
and "depressed" overlap with each other. The overlapping nature of emotions makes the classification
extremely challenging when considering all of the feelings at once. Furthermore, the impact of a certain
emotion varies among individuals. The current chapter is based on five fundamental emotions selected
from the wide range of emotions in order to avoid this complication. The objective is to create an
automated algorithm that can quickly and easily identify emotions for use in real-time applications.
Multiple emotions will add to the computational load and make practical implementations challenging.
The current algorithm's future efforts will focus on identifying various emotional states of a person,
including psychological issues, which are sadly becoming more common in today's society. Furthermore,
the current algorithm makes it simple to provide continuous mental excitation monitoring for individuals
with mental illnesses. This study can be expanded to diagnose mental shock patients, which could assist
medical professionals in taking the appropriate action and lowering the risk of suicide attempt. It is
possible to determine the mental state of a person who is deaf and dumb by using this algorithm. These
days, online gaming and other multimedia platforms are highly popular, and the suggested method may
be useful for understanding the user's mental response. Only five distinct emotions are used in the current
procedure, all of which are very universal and replicable in an experimental paradigm. Future iterations of
the algorithm will further enhance it to the point where a single algorithm can accurately recognize
several emotions, something that the literature currently lacks.
There is adequate potential for the classification of the five emotional states using the extracted feature
with a reasonable level of accuracy utilizing a straightforward, linear threshold-based classification
algorithm. The primary reason for using this categorization technique is to reduce the computational load
and to improve implementation prospects. However, it is noted that in order to enhance the overall
effectiveness of the suggested strategy, the selected threshold-based classifier has to be further optimized
and assessed over a bigger dataset. By putting the established algorithm into software, the full testing
process is carried out and verified. It is also stated that the current study does not use any hardware
implementation of the algorithm. High-tech personal assistance equipment and a wide range of other real-
life applications can benefit from the exceptional performance of the selected algorithm. The following is
a summary of some of the most important details about the suggested algorithm:
For this study, just five different emotions are taken into account. Numerous additional emotions can be
included in the algorithm. Given that mental stress is on the rise in today's culture, the approach could be
expanded to assess an individual's mental stress levels. All of the analysis for this study is done on an
offline software platform. Further development of the same algorithm could enable its use in Internet of
Things based systems, allowing for direct monitoring of the outcomes even from a remote place. The
classification process is carried out using a single, simple-to-calculate characteristic. In the future, a few
more features might be developed to make the algorithm stronger. Real-time PPG signals for particular
emotion stimuli can be captured in the lab, and the algorithm's performance can be examined.
5.8 Conclusion
It is clear that accurately identifying a person's emotional state using a streamlined computerized method
gives conventional human-machine interactions a new dimension and is useful for a wide range of
extremely complex applications. The scientific community is already aware, that PPG signals can be a
useful substitute for current bio-signals in situations where patient comfort and ease of use are the main
concerns. Although the suggested method performs better, it only utilizes a small portion of the DEAP
dataset; therefore its applicability for other applications will need to be further investigated. In the field of
HMI and other multimedia applications, where analysis is typically dependent on more complex signals
like EEG, ECG, etc., the usage of PPG-based emotion detection can be a viable substitute. Most crucially,
beside HMI applications, there are specific therapeutic situations involving some serious illnesses where
treatment planning would be simpler if the doctor is aware of the patient's emotional state beforehand.
When compared to other traditional methods, the suggested PPG-based emotion identification produces
superior results in certain situations. Overall, the usability of the algorithm in real-time applications is
guaranteed by the simplicity of the selected methodology, the algorithm's quick execution, and the high
average accuracy that is obtained. It should be noted, however, that in the future, the proposed
methodology is meant to be improved by adding a number of additional important emotions with more
straightforward methodologies, so that the final algorithm can be used in other cutting-edge, wireless
HMI applications for the detection of some critical disorders, like schizophrenia, autism and others.
This chapter describes the utility of PPG signal for extraction of respiration signal
which remains embedded in it during acquisition.
Instead of using an additional respiration sensor, a single channel PPG sensor is
effective for recording respiration signal along with PPG signal.
The features obtained from PPG signal and from PPG extracted respiration signal
are used to estimate the mental stress conditions with respect to relaxed conditions
of the subjects.
5.1 Overview
Because of the increasing complexity of our society, mental stress is a part of life for everyone. Long-
term mental stress situations must be addressed as soon as possible since they have the potential to cause
a variety of chronic diseases. The current techniques for measuring mental stress based on
electroencephalograms (EEGs) are frequently intricate, multi-channel, and expert-dependent. Conversely,
the respiratory signal provides interesting insights into stress, but capturing it is difficult and necessitates
multimodal support. The respiratory signal can be separated from the photoplethysmogram (PPG) signal
in order to get around this problem. This suggested method for identifying the stressed state multimodally
characterizes the readily obtained PPG signal. The developed algorithm specifically uses a significant
PPG feature and, through simplified methods, also extracts the respiratory rate from the same PPG signal.
PPG records obtained from the publicly available DEAP dataset are used to evaluate the approach. The
suggested algorithm's efficacy is evaluated using a straightforward threshold-based approach in
conjunction with conventional classification approaches. Its average accuracy for classifying stressed and
relaxed states is 98.43%. The present method performs better than the available approaches, and its low
acquisition load and straight forward methodology enable its deployment in standalone, real-time
personal healthcare gadgets. The suggested method's usefulness in healthcare applications is ensured by
the usage of basic features and a simple classification strategy.
5.3 Methodology
The general block diagram of the entire methodology is displayed in Figure 5.1. The four main parts of
the method are as follows: (1) removing high frequency and power line artifacts from PPG data from the
DEAP dataset; (2) obtaining the respiratory induced amplitude variation signal (RIAV) and calculating
the respiration rate; (3) extracting features from the clean PPG signal; and (4) using an threshold rule
based classification to differentiate between the stressed and the relaxed conditions.
5.3.4. Classification
The overall ability of the extracted features to discriminate between mental stress conditions and relaxed
conditions is then assessed using standard classification approaches based on the extracted feature values
and a simple threshold rule-based classification. Because the feature threshold values for the chosen
classifier are deduced from the feature boundary values, the feature space is appropriately partitioned. It
should be mentioned that in order to reduce the impact of classifier biasing, the classification technique
employs a fourfold cross validation technique, in which the entire data sets are divided into four equal
parts. Following that, the classifier is trained on three of these four sets, saving the fourth set for testing.
By repeating this procedure for every possible combination of data sets, each dataset is used in both the
training and testing stages. A fourfold cross validation technique is used to compute the performance, and
other popular classification techniques such as Logistic Regression (LR), Linear Discriminant (LD),
Support Vector Machine (SVM), and k Nearest Neighbour (kNN) approaches are used to evaluate the
efficacy of the selected features. The most recent statistical parameters like Accuracy, Sensitivity and
Specificity are used to evaluate the overall classification performance of all the classifiers listed above, as
indicated in [5.22].
5.4 Results and Validation
It is accurate to say that a stressed-out condition elicits a substantial arousal whereas a relaxed state does
not. Figure 5.5 displays a box plot that illustrates the feature values' overall discriminating effectiveness.
The box-plot of the derived feature clusters supports the predictions of high arousal (stressed) and low
arousal (relaxed) emotional states made by various emotional models [5.23]. In addition, the overall
feature distribution and the non-overlapping feature values shown in box plot of Figure 5.5 indicate that,
in addition to other conventional classifiers, a simple threshold rule-based classifier is adequate for
correctly predicting the two classes. Table 5.1 presents the overall efficacy of the chosen classification
method for two distinct classes, exhibiting 98.43% accuracy, 96.96% sensitivity, and 100% specificity on
average. Table 5.2 displays the total performance of the selected features after they are further evaluated
using traditional classifiers. Five standard characteristics are used to assess the performance of the
classifiers: Accuracy, Sensitivity, Specificity, Precision and F1 score,. Figure 5.6 displays the various
values of the aforementioned parameters that were discovered throughout the categorization phases.
Table 5.3 also provides the confusion matrix for each classifier's classification stage. The high-value of
classifiers' evaluation parameters validates the recommended method's applicability and usefulness.
Furthermore, ROC curves in Figure 5.7(a) and Figure 5.7(b) demonstrate the superior performance of the
threshold-based classifier in comparison to other conventional classifiers. The usage of the suggested
features is further supported by the high area under the curve (AUC) values that were obtained for each
classifier.
5.6 Discussion
The PPG signal's multimodal characterization in this chapter makes it easy, automatic, and reliable to
identify mental stress. The key contributions of the proposed technique are enumerated below:
1) Unlike the multi-lead, complex EEG signal, the PPG signal is obtained with a single sensor.
2) Two basic features are constructed using the PPG signal and the respiration signal computed from the
same PPG signal.
3) Does not necessitate complex feature selection or ranking processes.
4) The use of linear classifiers with reasonable accuracy ensures the possibility of implementing AI-based
healthcare monitoring in real-world applications.
The multimodal evaluation of a single channel PPG sensor—which is patient-friendly, simple to acquire,
operator-independent, and contains crucial clinical information—represents the novelty of the chapter. To
the best of our knowledge, the use of two modalities of a given signal for stress detection hasn't been
studied previously before this approach. This technique has the added benefit of making the system
portable and user-friendly while also lowering the overall acquisition cost and complexity. The feature
values are computed in the present chapter using only twenty PPG beats of each record.
The computing cost might go up with more PPG beats, but fewer PPG beats might not be able to
adequately represent the changes brought on by stress. The signal length and the analysis that follows will
need to be thoroughly examined against other databases with different signal durations in the future for
increased classification accuracy.
Among the several complex classifiers, only simple linear classifiers are selected for accurate
classification of the stressed and relaxed states. Nevertheless, additional optimization and assessment of
the chosen classifier across a larger dataset are required to improve overall performance. Furthermore, in
the future, the robustness of the features will be evaluated using a few advanced classifiers. After being
implemented on the MATLAB software, the produced algorithm is assessed; no hardware platform is
utilized in this study. The exceptional performance of the adopted algorithm validates its appropriateness
for a wide range of real-world applications in the healthcare and associated fields, as well as for state-of-
the-art AI-based personal assistive gadgets.
5.7 Conclusion
This study presents a single multimodal characterization of the PPG signal as the basis for a robust,
automated, and easily implementable algorithm to identify the mental stress condition. The classification
result indicates that there is a notable variation in the PPG signal's distinguishing features when mental
stress is present. This suggests that the results not only validate the importance of the PPG signal but also
offer strong proof that the PPG signal can be a useful tool for determining a subject's mental stress level.
Due to its high average classification accuracy, sensitivity, and specificity, the suggested algorithm is also
capable of measuring mental stress in people from different populations. Due to its computational
simplicity, the method may be constructed on a hardware system and utilized as a smart, portable, stand-
alone application based on artificial intelligence for the study and detection of mental stress in remote
rural locations.
This chapter describes the utility of EEG signal for detection of eye ball movements
in four directions along with eye open and close conditions.
Instead of using an additional EOG electrode, EEG electrodes are used for
recording embedded eye movement signal in EEG signal.
The EEG data was recorded in the laboratory from various subjects and the
developed algorithm is tested on these data for classification of eye ball movements.
6.1 Overview
In order to help the elderly and disabled people, modern human-machine interfaces (HMIs) use a variety
of human expressions. Depending on the kind of handicap, eye movements are frequently determined to
be the most effective means of communication for transmitting expressions. These days, eye movement
detection is done using Electroencephalogram (EEG) based setups, which are used to investigate
neurological conditions. However, most of the state-of-the-art EEG-based studies either use a greater
feature dimension with restricted classification accuracy or detect eye movements in a lesser direction.
This chapter elaborates a robust, straightforward, and automated technique that classifies six distinct
types of eye movements using the analysis of the EEG signal. To remove a variety of noise and artifacts,
the program applies discrete wavelet transform (DWT) to the EEG obtained from six distinct leads. Next,
a binary feature map is created by extracting two features per lead from the reconstructed wavelet
coefficients and combining them. Finally, six different types of eye movements are classified using a
threshold-based method based on an unique feature derived from the binary map's computed weighted
sum. With only one feature value, the algorithm has high average accuracy (Acc), sensitivity (Se), and
specificity (Sp) of 95.65%, 95.63%, and 95.63%, respectively. The suggested approach has enormous
potential for implementation in personal assistive devices, as demonstrated by the results achieved from
the adoption of simple methodology and comparison to other state-of-the-art methods.
6.3 Methodology
Figure 6.1 provides an overview of the methodology used in the suggested EEG-based eye-movement
analysis method. The algorithm consists of four essential components: 1) pre-processing the recorded
EEG data using wavelet transformation to remove various sounds and artifacts; 2) extracting features
linked to eye movements from the wavelet coefficients; 3) producing a binary feature map and binarizing
the acquired feature set 4) calculating the binary feature map's equivalent-weighted sum and classifying
the eye movements depending on thresholds.
Prior to the commencement of the recording, all volunteers received an explanation of the complete
experimental method and were instructed to remain in a relaxed or idle state, with their eyes closed, for
around two minutes. Following the period of rest, the EEG signal is recorded for a further two minutes
while the subject's eyes are closed. Next, without moving their heads, the patients were instructed to open
their eyes and focus on a designated spot at the same horizontal level as their eye on the wall. The EEG
signal is recorded for two minutes. After that, the participants were instructed to concentrate on the
following location, which was at the same horizontal level and angled roughly 30 degrees to the left and
right of the central (first) mark, respectively. A two-minute recording of the EEG signal is made for each
of the aforementioned scenarios. Similar recordings were also made at a site approximately thirty degrees
above and below the central mark. For every patient, the complete EEG data collecting process is
repeated with varying eye directions, as shown in Figure 6.3. In accordance with institutional and
international practice, each subject signed an informed consent form before to the experiment.
The minimal duration of an eye blink signal is discovered to be close to one second once the full database
has been examined. After that, a sliding window is selected appropriately to get rid of the eye blink
artifacts. Now, each time the window moves from the eye blink instant's beginning, the average value of
the signal is substituted by computing the signal's value one second before and one second after the
specified sample instant. The data substitution procedure is executed for the duration of the eye blink.
The identical process is then repeated by moving the sliding window from the ending instant of the
current eye blink to the beginning instant of the subsequent eye blink. Figure 6.8 displays different EEG
signal bands before and after the eye blink artifact was eliminated. Feature extraction is then performed
on the obtained clean EEG signal.
These computed power levels are used to extract two features: the weighted average power (WAP) and
the absolute power factor (APF). APF is defined as the power of a certain band divided by the overall
power of all the bands for a given lead. The whole power of all the leads divided by the total number of
leads is known as WAP for a particular EEG band. The following formulas can be used to calculate APF
for a given EEG band (δ) and WAP for a given band alpha (α) of a specific lead.
where n represents the number of channels and 𝑃𝛿 , 𝑃𝜃 , 𝑃𝛼 , 𝑃𝛽 , 𝑃𝛾 denotes the power of the delta, theta,
alpha, beta and gamma bands respectively for a given channel.
For better representation, the computed features as suggested by the aforementioned Eqs. (6.3) and (6.4)
are displayed in Table 6.2 after being rounded to two decimal places.
In Table 6.3, the remaining cells with "0" values are painted gray, while all of the cells with "1" values
are displayed in black. Table 6.3 clearly demonstrates that, regardless of the subjects,
the generated binary map displays a distinct discriminating combination for each eye position. The
outcome shows that, even without the use of classification logic, the produced binary feature map allows
for simple visual inspection-based discrimination of every eye condition. Currently, each element is
given a positional weight value equal to the associated cell number based on its position in the binary
feature map. Table 6.4 displays a representative unified binary feature map with all positional weight
values for a particular eye condition.
The binary feature map's content is then multiplied by the appropriate positional weight for each cell, and
the resulting total of these values yields a unique number. The Binary Weighted Feature Value (BWFV),
which is a distinct integer assigned to every eye position, is shown in Table 6.5. Finally, the distinct
values of BWFV are applied in this chapter to categorize the eye movements in various directions.
Additionally, Specificity (Sp) measures the percentage of accurately detected true negative cases, while
Sensitivity (Se) indicates a test's capacity to discover positive cases.
For every class of eye movements, the suggested BWFV feature values provide discriminating separation.
For the purpose of classifying the eye movements, a basic threshold-based binary classification has thus
been applied. To ensure thorough validation of the classification approach, a ten-fold cross validation
method is utilized taking into account the size of the adopted dataset. Initially, the features of twenty
subjects are separated into ten equal sections, with two subjects in each portion. Presently, for each
iteration, one component (two subjects) is removed, and the classifier is assessed on the remaining nine
components (eighteen subjects). After ten iterations of the procedure, the average result for each fold is
displayed in Figure 6.11 and listed in Table 6.6. With an average accuracy (Acc) of 95.65%, sensitivity
(Se) of 95.63% and specificity (Sp) of 95.63%, Table 6.6 presents the total results.
6.4 Discussion
Recent studies demonstrate that the fundamental characteristics derived from the eye movement sequence
can be employed to control the functionality of numerous assistive technologies. However, the analysis of
EEG signals is currently being investigated as an option to track the movement of the eye in different
directions due to the complexity involved in the classic EOG based approaches [6.13–6.15]. This suggests
that the eye movement can be detected using the same EEG setup that is often used to identify
neurological activity. Since it hasn't been thoroughly investigated yet, the detection of eye movement
using EEG-based methods is still seen as a promising field of study. To the best of our knowledge, there
are currently no online EEG signal databases that are annotated and related to the properties of eye
movements. Thus, in the current chapter, six electrodes are applied to each subject's scalp in order to
collect non-invasively EEG signal for various eye-movement conditions in the institutional laboratory.
Initially, the adopted methodology extracts two attributes from each lead using a wavelet-based
methodology and signal power. After then, a single binary feature map is created by combining these
features. Lastly, a straightforward threshold-based classifier is employed to detect the eye movement
based on a single distinct feature that was obtained from the binary feature map. All in all, a less complex
approach is used at every stage of the algorithm—aside from the wavelet transform—to make it suitable
for use in the current assistive technology. The advantages of the suggested algorithm are outlined below:
1) Strict pre-processing of the obtained EEG signal is performed using a strong wavelet-based method.
Prior to further processing, the selected DWT-based method makes it easier to reduce a variety of
artifacts from the signal.
2) A straightforward, distinct slope-change and signal averaging method is employed to remove eye
blink-related artifacts from the obtained EEG signal.
3) A novel binarization and feature mapping technique reduces the entire feature dimension to one,
negating the need for the algorithm's feature selection mechanism.
4) The binary values in the binary feature map itself exhibit unique properties that correspond to each
condition of the eye. By alone, these special combinations make it easier to distinguish between different
eye conditions through visual inspection. This indicates that the proposed study also provides a visual
categorization of eye conditions at an earlier stage of the process, before the automated classification of
the eye conditions using the distinct BWFV values.
5) The main benefit of the suggested approach is that it doesn't need a conventional, intricate classifier.
There is a significant amount of discrimination between the six different classes of eye motions in the
computed single BWFV feature values. Consequently, it is discovered that merely a basic, linear,
threshold-based classifier is adequate to accurately categorize the eye movement characteristics based on
the retrieved feature values. This eliminates the need for a training phase, significantly reducing
complexity.
In addition to its numerous uses in the fields of neurology and psychology, EEG signal utilization has
certain disadvantages [6.24]. Six separate leads are used in this investigation to collect EEG signal. It is
evident that the requirement for an EEG acquisition cap and the proper positioning of six separate EEG
electrodes around the scalp frequently cause discomfort for the patients. The degree of discomfort
experienced by individuals can represent significant obstacles to the straightforward collection of EEG
data, contingent on their physical and mental health.
In the current study, no real-time hardware platform is used; instead, all algorithm testing is carried out
and validated through software implementation. On a desktop computer, MATLAB is used to prepare the
assessment result. It takes over twenty seconds to categorize six different types of eye movements when
the complete algorithm is executed, beginning with wavelet transformation, denoising, wavelet
coefficient selection, feature extraction, binarization, and ultimately classification. Based on the average
execution time gathered from all of the EEG records, the aforementioned execution time was calculated.
The suggested method's promising performance clearly demonstrates its use in cutting-edge assistive
devices that cater to a wide range of populations with significant physiological barriers.
6.5 Conclusion
Eye movements can be utilized to communicate human expressions that can help with various types of
mental or physical disabilities. The current chapter proposes a reliable and precise algorithm that
distinguishes between the six various types of eye movements by analyzing the EEG signal. The
proposed methodology primarily uses a robust wavelet-based approach for feature extraction and signal
denoising. Using a special binarization method, the features extracted from the selected wavelet
coefficients are then combined to create a single binary feature map. A straightforward threshold-based
classification method is utilized to categorize six distinct eye movements using a discriminating feature
value obtained from the binary feature map. The experimental results acquired in this chapter show good
average detection accuracy and high execution speed when evaluated over several EEG signal records, in
comparison to the related researches.
The algorithm's promising outcome, high-speed execution, and simple methodology ensure its
compatibility with state-of-the-art assistive HMI devices. Future work will focus on integrating the
algorithm into real-time, multifunctional personal assistive devices that may wirelessly and portably
acquire EEG signals using reduced leads. Additionally, the algorithm will be updated to support
additional eye movements, such as blinking of the eyes to make commands easier to provide.
This chapter describes the utility of EEG signal for detection of mental stress.
Instead of using all the electrode positions for conventional EEG recordings, only
four pairs of electrodes are chosen for reducing computational burden.
The statistical features obtained from EEG signal are used instead of conventional
time or frequency domain features.
7.1 Overview
Mental stress is a significant emotional state which has an adverse effect on humans due to modern
lifestyle and work pressure. With the development of artificial intelligence, stress recognition has
demonstrated numerous beneficial applications in people's lives. Since human mental stress can be
accurately reflected by Electroencephalogram (EEG) signals, stress recognition approaches based on
EEG signals has emerged as a key area of study in real world and artificial intelligence applications.
However, the majority of currently used stress detection techniques perform poorly in recognition, which
hinders the advancement of these techniques in real-world settings. Moreover, the computational
complexity of the complex features and signal processing stages is a major challenge for the application
of EEG in this field. To alleviate this problem, a simple automated algorithm is proposed for automated
recognition of human mental stress based on two easy to compute features extracted from the EEG sub
bands generated using Discrete Wavelet Transform (DWT). The features are derived from the signal
energy of the different sub bands. SVM and k-NN classification techniques are adopted to classify
between the stressed and relaxed conditions based on the above features. The average values of obtained
accuracy, sensitivity and specificity are 98.73% and 98.63% and 98.82% respectively. Results reveal that
the proposed technique can be implemented in monitoring systems for early detection of mental stress
which can be beneficial towards decreasing the number of suicidal attempts or prolonged mental diseases.
7.3 Methodology
Figure 7.1 provides a block summary of the entire methodology used in the proposed EEG-based stress
detection algorithm. The algorithm consists of four essential components: 1) selection of EEG data from
DEAP dataset, 2) pre-processing of the EEG records using moving average filter and wavelet
transformation to remove various artifacts, 3) feature extraction where two easy to compute features are
extracted from the wavelet coefficients and 4) classification, where SVM and k-NN based classifiers are
used to classify between the stressed conditions and relaxed conditions.
7.3.4 Classification
For each subject, there are four electrode positions and for each electrode position, there are two feature
values for stressed and relaxed conditions. In total, there are sixteen different feature arrays for all the 32
subjects. As a whole, there are 512 feature values for classification stage. The fundamental concept of
classification stage is to create an optimum hyper plane which can identify and separate out the two
different classes based on the extracted features. The classification between stressed and relaxed
conditions is carried out using two standard classifiers. Literature shows the applicability and accuracy of
SVM [3.223- 3.226, 7.14] and k-NN classifiers [3.225, 3.226, 7.15] for the analysis of biomedical signals.
Hence, for the present study, both SVM and k-NN classifiers are used to identify the stressed and relaxed
conditions. During the classification technique, all the feature arrays are considered together for
classification and the entire dataset is divided into five equal parts to ensure thorough validation of the
classification approach. Four parts are used to train the model while the fifth part is used to test the
model. This process is repeated five times for avoiding classifier biasness and to obtain the best fit result.
These five iterations ensure that each set passes through both the training and testing phase during
classification. The final decision boundary is determined from the average of the decision boundaries
obtained in the five iterative stages.
Moreover, three statistical measures, such as Accuracy (Acc), Sensitivity (Se) and Specificity (Sp), are
used to evaluate the algorithm's efficiency. The following formula is used to create each parameter. TP
stands for the number of positive classes that are accurately identified as positive; TN for the number of
negative classes that are correctly identified as negative; FP for the number of negative classes that are
mistakenly identified as positive; and FN for the number of positive classes that are wrongly identified to
be negative but are actually positive. Additionally, Specificity (Sp) measures the percentage of accurately
detected true negative cases, while Sensitivity (Se) indicates a test's capacity to discover positive cases.
The accuracy, sensitivity and specificity values obtained for the SVM classifier are 98.63%, 99.21% and
98.04% respectively. K-NN classification generates the values of 98.82%, 98.04% and 99.60% for the
above three parameters. A high average accuracy (Acc) of 98.73%, sensitivity (Se) of 98.63%, and
specificity (Sp) of 98.82%, respectively, are obtained after the classification stage. The detailed
performance values for both the classifiers are summarized in Table 7.2.
7.6 Discussion
Given that the system's ability to detect stress is demonstrated and that there is a satisfactory correlation
with a simple feature extraction model, further work will focus on refining the feature extraction and
classification algorithms and conducting more experiments to test more precise stress models in practical
settings. The current algorithm's future efforts will focus on identifying a person's emotional states in
addition to stress, including psychological issues, which are sadly becoming more common in today's
society. Furthermore, the current algorithm makes it simple to provide continuous mental excitation
monitoring for individuals with mental illnesses. This work can be expanded to determine the state of an
individual experiencing mental shock, which could assist the doctor in taking the appropriate action and
lowering the likelihood of a suicidal attempt. It is possible to determine mental condition of a person who
is deaf and dumb by using this algorithm. The algorithms will also be translated into a suitable language
to ensure its use on portable devices, proving a suitable mechanism for assessment of stress in real
environments and applications. This will develop a system that can detect stress levels and an immediate
action can be taken to prevent chronic stress and health consequences. Some of the major points
regarding the proposed algorithm are summarized below:
1) For this study, only the mental stress has been taken into account. Many additional emotions can be
included in the algorithm.
2) Given that mental stress is on the rise in today's culture, the approach could be expanded to assess an
individual's mental stress levels on a continuous basis.
3) A strong wavelet-based method is used to do a thorough pre-processing of the obtained EEG data.
Prior to further processing, the selected DWT-based method makes it easier to reduce a variety of
artifacts from the signal.
4) Only the EEG data from DEAP dataset is considered for the present analysis. In future, some more
public datasets might be used to increase the validity and robustness of the algorithm.
5) All of the analysis for this study is done on an offline software platform. Further development of the
same algorithm could enable its use in Internet of Things (IoT) based systems, allowing for direct
monitoring of the outcomes even from a remote place. The field of the Internet of Medical Things
(IoMT) will greatly benefit from this.
6) The classification process is carried out using only two, simple-to-calculate features. In future, a few
more features might be developed to make the algorithm stronger.
7) Real-time EEG signals for particular mental stress stimuli can be captured in the lab, and the
algorithm's performance can be examined.
7.7 Conclusion
Stressed conditions can be utilized to understand the mental state of human beings which will help in
various predictions and possible treatment in order to avoid any negative consequences. The current work
proposes a reliable and precise algorithm that distinguishes between the stressed and relaxed conditions
by analyzing the EEG signal. The proposed methodology primarily uses a robust wavelet-based approach
for feature extraction and signal denoising. Subsequently, a linear SVM and fine k-NN classification
techniques are employed to categorize the mental stress conditions with respect to the relaxed condition
using discriminating values obtained from the feature matrix. Modern assistive HMI devices can be
compatible with the algorithm owing to its strong methodology, rapid execution, and hopeful result.
Future work will focus on integrating the algorithm into multifunctional personal assistive gadgets that
can be portable and wirelessly acquire EEG signals using minimized electrodes. Additionally, the
algorithm will be updated to support additional emotional states in addition to stress detection.
This chapter concludes the thesis, highlighting the overall achievements and
shortcomings of the individual works performed.
It also discusses the future perspectives of the work undertaken for the thesis.
The development of algorithms for the automated analysis of the two most significant cardiac signals
PPG and EEG has been our focus. The algorithms are especially made to fit the needs of the present day
scenario which demands for portable and low cost medical diagnostic equipment. However, the
construction of algorithmic models is the exclusive focus of the current study, which has mostly been
evaluated and implemented at the software level. Our future research efforts will be concentrated on
creating a working prototype of an automated monitoring system that will allow for routine, reasonably
priced diagnosis and automated monitoring.
8.1 Conclusion
The main objective of the present thesis is to develop algorithms for automated analysis of the two most
significant bio signals: the photoplethysmogram (PPG) and electroencephalogram (EEG). The analyses of
the two signals are mainly focused on emotional state classification and mental stress detection. In order
to enable the automatic detection of primary emotions and mental stress conditions, the signals are
analyzed and certain features are extracted which can be effective for these applications. The algorithms
used for the present analysis are designed to enable substantial reductions in computational time without
sacrificing signal quality. The algorithms are specifically made to be used with automated monitoring
devices that have little amounts of processing power. By using these intelligent, portable monitoring
devices, a greater number of people can benefit from quick, inexpensive, and simple diagnosis. The
specific tasks completed for the thesis are outlined here, emphasizing both their overall successes and
shortcomings.