Project Report

StressSense: A Deep Learning Approach to Facial
Expression-Based Stress Detection
A Report Submitted in Partial Fulfilment of the Requirements for the
SN Bose Internship Program, 2024
Submitted by
Akash Suklabaidya and Arbi Chaliha
Under the guidance of
Mr. Biswanath Dey

Associate Professor
Department of Computer Science & Engineering
National Institute of Technology Silchar
Department of Computer Science & Engineering

NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR
Assam
June-July, 2024
ACKNOWLEDGEMENT
We would like to thank our supervisor, Mr. Biswanath Dey, CSE, NIT Silchar, for
his invaluable direction, encouragement, and assistance during this project. His helpful
suggestions for this entire effort and cooperation are gratefully thanked, as they enabled
us to conduct extensive investigation and learn about many new subjects.
Our sincere gratitude to Mr. Biswanath Dey for his unwavering support and patience,
without which this project would not have been completed properly and on schedule.
This project would not have been feasible without him, from providing the data sets to
answering any doubts we had to helping us with ideas whenever we were stuck.
Akash Suklabaidya Arbi Chaliha
Department of Computer Science and Engineering

National Institute of Technology Silchar, Assam
Contents
1 Introduction 3
2 Related Work 4
3 PROPOSED METHODOLOGY 5
3.0.1 Datasets: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.0.2 Model Architecture: . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Results 7
4.1 Training and validation performance . . . . . . . . . . . . . . . . . . . . 7
4.2 Multi-class Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2.1 Classification Performance Metrics: . . . . . . . . . . . . . . . . . 8
4.2.2 Confusion Matrix: . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2.3 Binary Evaluation: . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1
Internship Summary
Internship Details
Title of Internship: StressSense: A Deep Learning Approach to Facial Expression-Based
Stress Detection
Internship Program: SN Bose Internship Program, 2024
Institution: National Institute of Technology Silchar, Assam, Department of Computer
Science and Engineering
Internship Duration: June - July, 2024
About the Internship

The internship project, titled StressSense, focused on developing a system for stress
detection using deep learning techniques on facial expressions. The system, leveraging
Convolutional Neural Networks (CNN) and MobileNetV2, aimed to recognize stress by
analyzing facial expressions captured via webcam in real time. Key aspects of the project
included:
• Data Collection: Utilized datasets like CK, CK+, and KDEF containing annotated
facial images.
• Model Training: Conducted using CNN and MobileNetV2 architectures to classify

emotions.
• System Implementation: Real-time monitoring system designed to notify users

of high stress levels.
• Results: MobileNetV2 demonstrated higher accuracy and stability in stress detec-

tion compared to traditional CNN models.
This work contributes to advancements in stress detection and management by utilizing

deep learning in facial expression analysis, highlighting the potential for future research
in real-time emotional monitoring and well-being applications.
2
Chapter 1
Introduction
Face perception is crucial in social interactions, providing information about age,

gender, race, identification, thoughts, and emotions [1]. Accurate interpretation of
emotions conveyed through facial expressions is key for effective communication,
developing from infancy through adolescence [2],[3],[4].
Six universal facial expressions—happy, sad, anxious, disgusted, fearful, and sur-
prised—are linked to significant individual emotions [5]. Understanding these
expressions is essential for mental health and social functions, influencing empathy,
bonding, and conflict resolution. Nonverbal communication is also critical in profes-
sional settings, affecting interactions, leadership, and collaboration.
Advancements in AI and machine learning have spurred interest in improving facial

expression interpretation. Facial recognition systems are being developed for security,
human-computer interaction, and mental health diagnostics. Automated systems
can identify early signs of emotional distress or mental health issues by analyzing
subtle facial expression variations.
In summary, studying facial expressions and developing assessment tools are vital.
These studies enhance our understanding of human emotions and social interactions,
creating opportunities for innovative applications in daily life and professional work.
3
Chapter 2
Related Work
Facial expression detection is increasingly used in psychoanalysis for stress management

due to technological advancements [12][13]. Facial expressions involve changes in the
mouth, eyes, and face, with eyes often conveying emotions. Emotional dysfunction, such as
reduced emotion identification, impairs the ability to read others’ facial emotions, affecting
social and interpersonal functioning in children and adults with mental illnesses like
anxiety and mood disorders [6][7][8][9]. Individuals with mental disorders often interpret
negative facial expressions with a bias, which may persist even without a current disorder
[6].
Adolescence is critical for mental health, with over half of adult mental illnesses
developing by age 14 [22]. Australian data shows that 1 in 7 young individuals have
anxiety disorders, and 1 in 16 experience depression. Symptoms of psychological distress
can appear in early adolescence before a formal diagnosis [23]. These trends are consistent
with data from other Western nations [24]. Youth with psychological issues may already
show abnormal identification of negative facial expressions even before a clinical diagnosis.
Studies show teenagers with depression often misinterpret sad facial expressions,
seeing other emotions as sadness, while those without depression tend to misread them
as happiness [10]. Adolescents with psychopathic tendencies also struggle with facial
recognition, particularly with fear and sadness [11].
Workplace stress in India significantly impacts employee well-being and productivity,

with 77% of employees attributing stress to decreased productivity and 82% linking it to
health issues like immune system and digestive problems [19]. Studies by Microsoft and
Springworks found increased burnout and disengagement due to extended work hours and
stress [20][21]. This highlights the need for improved mental health support and work-life
balance in Indian organizations.
Artificial intelligence, specifically deep learning, enhances facial emotion detection

systems like FaceNet, DeepFace, VGGface, and DeepID, with CNN models such as ResNet,
VGGNet, GoogleNet, and MobileNet being utilized [14]–[17]. MobileNetV2, in particular,
improves performance and efficiency for real-time applications, reducing computational
effort while maintaining accuracy, and is easily integrated into CNN architectures, making
it suitable for scenarios with limited processing capacity [25].
4
Chapter 3
PROPOSED METHODOLOGY
Figure 3.1: Overview of the Stress Detection System
(a) Webcam Integration: Captures video frames at intervals for continuous data
analysis.
(b) Facial Region Extraction: Uses algorithms to extract facial regions from video
frames.
(c) Emotion Classification: Classifies seven major emotions using a trained deep
learning model.
(d) Stress Detection Algorithm: Analyzes emotion likelihoods to detect dominant
emotions, prioritizing stress-related ones.
(e) User Notification and Feedback Loop: Alerts user of potential stress and solicits
feedback for accuracy.
(f) Continuous Monitoring: Processes images regularly for ongoing stress evaluation
and timely intervention.
3.0.1 Datasets:
Dataset Description:
The project utilized the Cohn-Kanade (CK), Cohn-Kanade Extended (CK+), and Karolin-
ska Directed Emotional Faces (KDEF) datasets, known for their extensive collections of
annotated facial images. The CK and CK+ datasets feature images with varied expression
5
intensities and sequences from neutral to peak emotions, while the KDEF dataset includes
standardized photos of actors portraying different emotions. All datasets were split 80:20
for training and testing to facilitate model training and validation.
Data Preprocessing:
The dataset, containing 32,900 grayscale images of faces in eight emotions, was preprocessed
by standardizing pixel values from 0 to 1 and resizing images from 48x48 to 128x128
pixels. Data augmentation using Keras’s ImageDataGenerator included random rotations,
shifts, shearing, zooming, and horizontal flipping. Finally, the dataset was split 80:20 into
training and test sets, with augmented images used for training and normalized images
for testing.
3.0.2 Model Architecture:

For our project, we compared the performance of a traditional Convolutional Neural
Network (CNN) and MobileNetV2 in detecting facial expressions to determine stress
levels.
1. Traditional CNN:
a. Input Layer: Receives unprocessed pixel data.

b. Convolutional Layer: Uses filters/kernels for edge and texture detection.
c. Activation Layer: Employs ReLU for non-linearity.
d. Pooling Layer: Reduces spatial dimensions (e.g., max pooling) to lower compu-
tational load and generalize features.
e. Fully Connected Layer: Integrates learned features for classification.
f. Output Layer: Generates final classification with softmax activation.
2. MobileNet V2:
a. Input Layer: Receives raw image data.

b. Convolutional Layer: Uses initial convolutions for basic information gathering.
c. Depthwise Separable Convolution: Breaks down convolution into depthwise
and pointwise steps to reduce parameters and computations.
d. Inverted Residuals: Combines convolutions and skip connections for efficient
feature extraction.
e. Linear Bottlenecks: Uses linear layers to maintain efficiency and preserve
important information.
f. Activation Layer: Utilizes ReLU6 for improved low-precision calculations.
g. Pooling Layer: Employs Global Average Pooling to condense feature maps.
h. Fully Connected Layer: Smaller in size due to efficient prior layers.
i. Output Layer: Uses softmax for classification.
6
Chapter 4
Results
4.1 Training and validation performance

The graphs show that MobileNet V2 has consistent training accuracy and decreasing
training loss, with validation accuracy stabilizing after 20 epochs and validation loss
following a similar trend despite some fluctuations. This indicates effective learning,
strong generalization, and minimal overfitting.
Figure 4.1: Sample images
The traditional CNN shows effective learning with consistent training accuracy and
decreasing loss but exhibits significant validation fluctuations. It ultimately generalizes
well, with higher test accuracy compared to training. MobileNet V2 demonstrates more
stable learning and validation metrics, making it more reliable and better suited for varied
deployments. While both models perform well, MobileNet V2’s stability and generalization
make it the preferred choice.
7
4.2 Multi-class Evaluation
4.2.1 Classification Performance Metrics:
Facial Expression Precision Recall F1-Score Support

anger 55% 54% 55% 945
contempt 0% 0% 0% 26
disgust 59% 47% 52% 159
fear 45% 12% 19% 691
happiness 81% 88% 84% 1810
neutrality 61% 49% 55% 1015
sadness 51% 68% 58% 1081
surprise 62% 81% 70% 846
Accuracy 64% 6573
Macro avg 52% 50% 49% 6573
Weighted avg 62% 64% 61% 6573
Table 4.1: Performance metrics for Traditional CNN
Facial Expression Precision Recall F1-Score Support

anger 64% 72% 68% 3780
contempt 0% 0% 0% 104
disgust 75% 58% 65% 636
fear 64% 37% 47% 2763
happiness 90% 93% 91% 7239
neutrality 62% 77% 69% 4057
sadness 74% 62% 67% 4322
surprise 76% 84% 79% 3380
Accuracy 74% 26281
Macro avg 63% 60% 61% 26281
Weighted avg 74% 74% 73% 26281
Table 4.2: Performance metrics for MobileNet V2
MobileNet V2 provides stable performance and better generalization, whereas the conven-
tional CNN shows greater variability and requires more fine-tuning.
4.2.2 Confusion Matrix:

MobileNet V2 outperforms the CNN in emotion classification across most categories,
showing higher accuracy and robustness, especially for ”happiness,” ”neutrality,” ”surprise,”
and ”sadness.” While both models struggle with ”contempt” and ”fear,” MobileNet V2
notably excels in predicting ”fear” and ”disgust” more accurately than CNN.
8
anger contempt disgust fear happiness neutrality sadness surprise
anger 510 0 39 31 61 71 192 41

contempt 3 0 0 3 15 4 1 0
disgust 42 0 74 2 6 1 29 5
fear 119 0 4 85 57 55 103 268
happiness 44 0 1 11 1588 36 66 64
neutrality 71 0 1 14 111 498 293 27
sadness 104 0 6 18 83 115 739 16
surprise 28 0 0 26 58 21 30 683
Table 4.3: Confusion Matrix for Traditional CNN

anger contempt disgust fear happiness neutrality sadness surprise
anger 2733 0 98 110 130 457 200 52

contempt 17 0 1 8 69 9 0 0
disgust 179 0 367 24 10 3 51 2
fear 536 0 10 1023 98 223 289 584
happiness 86 0 0 13 6720 239 35 146
neutrality 251 0 1 47 211 3142 319 86
sadness 416 0 14 166 164 861 2669 32
surprise 82 0 0 216 160 83 15 2824
Table 4.4: Confusion Matrix of MobileNet V2
4.2.3 Binary Evaluation:

The assigned stress weights for each facial expression are detailed in Table V.
Facial Expression Stress Weight Stress Level

Anger 1.0 High stress
Contempt 0.7 Moderate stress
Disgust 0.9 High stress
Fear 1.0 High stress
Happiness 0.0 No stress
Neutrality 0.2 Low stress
Sadness 0.8 Moderate to high stress
Surprise 0.3 Low to moderate stress
Table 4.5: Stress Weights for Facial Expressions
The model captures photos of the user via their computer’s camera and processes
them to assess stress levels based on facial expressions, enabling real-time stress monitoring.
Users provide feedback on detected stress levels, helping to verify system accuracy
9
and refine the model. This feedback loop ensures the system becomes more precise and
personalized over time.
By continuously improving through user input, the system offers effective stress man-
agement support and builds a dataset to understand stress patterns, ultimately enhancing
users’ well-being through advanced emotion recognition technology.
Figure 4.2: Sample images
10
Opportunity
Here are some potential future opportunities and areas for expansion for the ”StressSense”
project:
• Expansion to Mobile and IoT Devices: As MobileNetV2 is optimized for

low-power devices, ”StressSense” could be adapted for mobile platforms or wearable
devices, allowing users to monitor stress levels continuously on-the-go.
• Integration with Mental Health Platforms: The system could integrate with
existing mental health platforms, offering users seamless access to counseling or
therapy based on real-time stress data, enhancing mental health support accessibility.
• Longitudinal Stress Analysis: By aggregating and analyzing data over time,

the system could identify stress trends and patterns in individual users, potentially
alerting them to chronic stress before it becomes problematic.
• Use in Workplace Wellness Programs: Organizations could adopt this tech-

nology in wellness programs, using it to monitor workplace stress levels and adjust
policies or workflows to promote a healthier work environment.
• Advanced Customization for Different User Demographics: By incorporating

age, gender, or cultural information, the model could be fine-tuned to account
for demographic differences in facial expressions, leading to more accurate stress
detection across varied populations.
• Privacy-Enhanced Stress Detection: Research into privacy-preserving methods,

such as on-device data processing, could help ”StressSense” address privacy concerns
while maintaining effective stress monitoring, making it more suitable for widespread
use.
• Integration with Biofeedback Sensors: Future versions of ”StressSense” could

integrate biofeedback sensors (e.g., heart rate, skin conductance) for a multi-modal
approach, enhancing the accuracy of stress detection by combining physiological
and facial cues.
• Development of a Richer Emotional Dataset: Creating or contributing to a

comprehensive facial expression and stress-related emotion dataset could support
further research and refinement of AI models for stress and mental health assessment.
Challenges Faced
Based on the project report, here are potential challenges that may be encountered in the
”StressSense” system, spanning technology, hardware, and software:
• Data Collection and Quality
– Challenge: Achieving high-quality, diverse training data for different facial

expressions across age groups, ethnicities, and lighting conditions.
– Future Impact: Poor dataset diversity could affect model accuracy in real-
world applications, leading to bias in stress detection.
• Real-Time Processing Requirements
– Challenge: Processing video streams from webcams in real-time may demand

high computational power and efficient algorithms.
– Future Impact: Limited computational resources on some devices may result
in latency or degraded performance, affecting real-time detection reliability.
• Hardware Limitations
– Challenge: Variability in webcam quality and processing power across devices

can affect the consistency of stress detection.
– Future Impact: Lower-resolution webcams or devices with limited mem-
ory could struggle to capture facial details accurately, leading to incorrect
classification.
• Privacy and Ethical Concerns
– Challenge: Using webcam feeds for facial expression analysis raises privacy
issues, requiring informed user consent and secure data handling.
– Future Impact: Insufficient attention to data privacy may lead to user
resistance or regulatory issues, especially if sensitive data is mishandled.
• User Feedback Mechanism
– Challenge: Continuously refining the model based on user feedback is essential

but could introduce subjectivity or require frequent updates.
– Future Impact: Relying on subjective user input may create inconsistencies,
impacting the model’s learning and adaptation accuracy over time.
• Integration with Limited-Resource Devices
– Challenge: The model may need to run on mobile devices or laptops with
constrained resources.
– Future Impact: Running complex models on low-resource devices could drain
battery life quickly or require cloud offloading, which can affect latency and
user experience.
Learning outcome
For the ”StressSense” project report, learning outcomes can emphasize various technical
and practical skills acquired. Here’s a list that could be relevant to include:
• Understanding Deep Learning for Image Classification: Learned the im-

plementation of CNNs and MobileNetV2 models, exploring their architectures and
methods for image classification, especially within the domain of facial expression
and emotion recognition.
• Applying Transfer Learning: Gained experience using pre-trained models (e.g.,

MobileNetV2), understanding how transfer learning can accelerate model training
and improve accuracy for specific tasks like stress detection.
• Data Preprocessing and Augmentation Techniques: Enhanced skills in

handling and augmenting image datasets, learning methods like normalization,
resizing, and transformation to improve model robustness and prevent overfitting.
• Evaluating Model Performance: Developed the ability to analyze model metrics,

such as accuracy, precision, recall, and F1-score, and interpret results through
validation metrics and confusion matrices to refine model performance.
• Emotion Recognition and Psychological Insights: Learned about psychological

and behavioral aspects of emotion detection, applying technical solutions to address
real-world mental health issues through emotion classification.
• Real-Time System Development: Gained insight into real-time data processing,

integrating webcam functionality for continuous monitoring and feedback, and
achieving system efficiency for real-world stress monitoring applications.
• Ethical Considerations in AI Applications: Recognized the ethical implications

of deploying AI for mental health, understanding the importance of user consent,
privacy, and the need for reliable detection mechanisms.
These learning outcomes will show the comprehensive skill set gained, covering both
technical expertise in AI and practical knowledge for real-world applications.
Conclusion
This study assessed the effectiveness of conventional CNN and MobileNetV2 models
in accurately recognising face emotions and detecting stress levels using the KDEF
dataset. The findings demonstrated that MobileNetV2 surpassed the conventional
CNN in terms of accuracy, stability, and generalisation. The study successfully
accomplished stress detection by classifying emotions such as anger, disgust, fear,
and melancholy as indications of stress. The enhanced architecture of MobileNetV2
enables more accurate stress detection, making it a reliable tool for real-time stress
monitoring and management. The model’s accuracy was further improved by
continuous feedback and adaptive learning, providing a promising approach for
enhancing users’ overall well-being by effectively managing stress.

Project Report

Uploaded by

Copyright:

Available Formats

Project Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Report

Uploaded by

Copyright:

Available Formats

StressSense: A Deep Learning Approach to Facial

Expression-Based Stress Detection

A Report Submitted in Partial Fulfilment of the Requirements for the

SN Bose Internship Program, 2024

Akash Suklabaidya and Arbi Chaliha

Under the guidance of

Mr. Biswanath Dey

Department of Computer Science & Engineering

Akash Suklabaidya Arbi Chaliha

Department of Computer Science and Engineering

About the Internship

• Model Training: Conducted using CNN and MobileNetV2 architectures to classify

• System Implementation: Real-time monitoring system designed to notify users

• Results: MobileNetV2 demonstrated higher accuracy and stability in stress detec-

This work contributes to advancements in stress detection and management by utilizing

Face perception is crucial in social interactions, providing information about age,

Advancements in AI and machine learning have spurred interest in improving facial

Facial expression detection is increasingly used in psychoanalysis for stress management

Workplace stress in India significantly impacts employee well-being and productivity,

Artificial intelligence, specifically deep learning, enhances facial emotion detection

Figure 3.1: Overview of the Stress Detection System

3.0.2 Model Architecture:

a. Input Layer: Receives unprocessed pixel data.

a. Input Layer: Receives raw image data.

4.1 Training and validation performance

Figure 4.1: Sample images

Facial Expression Precision Recall F1-Score Support

Table 4.1: Performance metrics for Traditional CNN

Facial Expression Precision Recall F1-Score Support

Table 4.2: Performance metrics for MobileNet V2

4.2.2 Confusion Matrix:

anger 510 0 39 31 61 71 192 41

Table 4.3: Confusion Matrix for Traditional CNN

anger 2733 0 98 110 130 457 200 52

Table 4.4: Confusion Matrix of MobileNet V2

4.2.3 Binary Evaluation:

Facial Expression Stress Weight Stress Level

Table 4.5: Stress Weights for Facial Expressions

Figure 4.2: Sample images

• Expansion to Mobile and IoT Devices: As MobileNetV2 is optimized for

• Longitudinal Stress Analysis: By aggregating and analyzing data over time,

• Use in Workplace Wellness Programs: Organizations could adopt this tech-

• Advanced Customization for Different User Demographics: By incorporating

• Privacy-Enhanced Stress Detection: Research into privacy-preserving methods,

• Integration with Biofeedback Sensors: Future versions of ”StressSense” could

• Development of a Richer Emotional Dataset: Creating or contributing to a

• Data Collection and Quality

– Challenge: Achieving high-quality, diverse training data for different facial

• Real-Time Processing Requirements

– Challenge: Processing video streams from webcams in real-time may demand

– Challenge: Variability in webcam quality and processing power across devices

• Privacy and Ethical Concerns

• User Feedback Mechanism

– Challenge: Continuously refining the model based on user feedback is essential

• Integration with Limited-Resource Devices

• Understanding Deep Learning for Image Classification: Learned the im-

• Applying Transfer Learning: Gained experience using pre-trained models (e.g.,

• Data Preprocessing and Augmentation Techniques: Enhanced skills in

• Evaluating Model Performance: Developed the ability to analyze model metrics,

• Emotion Recognition and Psychological Insights: Learned about psychological