IEEE Format 1

Facial Expression Detection
Anish Panda, Priyansh Sachidanand Singh, and Vikas?

VIT-AP University, anish.21bce8748, priyansh.21bce7306, vikas.21bce@vitapstudent.ac.in
Abstract – Our group designed a simple convolutional can empower the blind to better navigate social interactions,
neural network for the purpose of facial expression leading to more meaningful and enriching conversations.
detection based on the FER2013 Dataset. Keras was used Despite the compelling potential of facial expression
to make the neural network model. The model is first detection, challenges persist in achieving real-time accuracy
trained on the original dataset, and then on the under diverse conditions. Presently, computers' ability to
generated dataset to achieve low variance, the model predict facial expressions in real-world scenarios remains
reached an accuracy of 67.34% after the complete limited, impeding seamless human-computer
training. communication. Additionally, human observers themselves
only possess a moderate level of proficiency in recognizing
Index Terms – CNN, emotion detection, human expression others' expressions, leaving room for improvement in this
identification. field.
The fusion of artificial intelligence, computer vision,
I. INTRODUCTION and deep learning methodologies, particularly Convolutional
The advent of the digital age has ushered in a plethora of Neural Networks (CNNs), is instrumental in addressing
opportunities for technological advancements, and among these challenges. By leveraging vast datasets and
them, the ability of computers to discern and accurately sophisticated neural network architectures, researchers
interpret human emotions and facial expressions stands as a endeavor to train machines to emulate human-like perceptual
promising frontier. Such an ability holds immense potential abilities, bridging the communication gap between humans
for various applications, permeating domains like Virtual and computers and rendering interactions more effective and
Reality (VR) gaming, online education, driving assistance, intuitive.
medical care, and support for visually impaired individuals.
By enabling computers to comprehend human emotions, a II. LITERATURE SURVEY
new realm of interactive experiences can be unlocked,
revolutionizing the way we interact with machines and each All papers must use the following layout:
other.
In the realm of VR gaming, facial expression detection
can significantly enhance user immersion and engagement. TABLE I
Imagine a gaming environment that reacts dynamically to a COMPARING SIMILAR APPROACHES
player's emotional cues, adapting the gameplay to evoke Research Paper Name Method Used Accuracy
Facial Expression Recognition CNN 66.7
tailored responses. Likewise, in online education, facial Using CNN with Keras
expression detection can provide invaluable feedback to (Apeksha Khopkar, et al.)
educators, enabling them to gauge students' attentiveness Facial Emotion Recognition: VGGNet 73.28
and emotional states during remote learning sessions, State of the Art Performance on
FER2013 (Yousif Khaireddin,
leading to personalized teaching approaches for enhanced et al.)
comprehension and retention. Hybrid Facial Expression DCNN and Haar Cascade 70.04
The potential implications in the realm of road safety Recognition (FER2013) Model Deep learning architectures
cannot be understated. Advanced driver-assistance systems for Real-Time Emotion
Classification and Prediction
equipped with facial expression detection cameras can (Ozioma Collins Oguine, et al.) 91.76
monitor drivers in real time to detect signs of fatigue or Real Time Face Expression Cycle GAN
distraction, thereby reducing the risk of accidents and Recognition along with
ensuring safer journeys for all road users. Balanced FER2013 Dataset
using CycleGAN (F. Mazen, et 70.02
In the healthcare sector, facial expression detection can al)
aid in assessing patients' emotional well-being and pain Deep-Emotion: Facial Attentional convolution
levels. Physicians and caregivers can employ this technology Expression Recognition Using network
to provide better-tailored care and empathetic support, thus Attentional Convolutional
Network (Shervin Minaee, et 65.03
improving the overall patient experience. al.)
Furthermore, the positive impact of facial expression Facial Expression Recognition CNN Ensemble
detection extends to supporting the visually impaired. By with CNN Ensemble (K. Liu, et 75.42
providing insights into the emotions of others, computers al.)
Local Learning with Deep and Local learning, SVM and
Handcrafted Features for Facial CNN followed by the final SoftMax activation layer with the
Expression Recognition 66.4 number of neurons corresponding to the number of classes.
(Georgescu, M, et al.)
Going Deeper in Facial Deep neural network
Expression Recognition using
Deep Neural Networks (Ali -
Mollahosseini, et al.)
Learning Vision Transformer Learning Vision
with Squeeze and Excitation for Transformer
Facial Expression Recognition 67.34
(Mouath Aouayeb, et al.)
Our Approach CNN with Mixed Training
Set all these values using the “FILE” Menu. Select the
Page Setup – Margins tab option and click on portrait
orientation option. The above margin dimensions can then
be inserted into the Page Setup Window. Now select the
Paper tab and click on the paper size and select the letter
paper size. Next, select the Page Setup – Layout tab and set
the Header and Footer to 0.5 inches. To set the column
width, click anywhere in the document within the two
column format, then Select the Columns for the Format
menu. Click on the Equal Column Width and set the
spacing. Make sure you also apply these changes to “This
Section.”
When formatting your document, make consistent use
of punctuation marks and spelling. Either American or
British formatting is acceptable, but it must be consistent, FIGURE I
not a mix. For example, OUR CNN MODEL ARCHITECTURE
 Putting all commas and periods either inside
(American) or outside (British) of quotation marks The training phase involved two key steps. Initially, the
model was trained on the preprocessed dataset.
 Use of single/double quotes, e.g. 'service
Subsequently, a second training phase was executed using
center' (British) rather than “service center” (American).
data augmentation. Augmentation techniques included
 Spellings such as grey and disc (British) vs.
rotation, horizontal and vertical shifts, shear transformations,
gray and disk (American).
and zooming, introducing diversity into the training set. The
III. PROPOSED WORK augmentation process was facilitated using the
ImageDataGenerator module. For both training phases, the
This study aims to develop an effective CNN-based model model was trained with a batch size of 128 for 100 epochs.
for facial expression detection using the FER-2013 dataset, Early stopping was implemented with a patience of 20 to
comprising 30,000 training images and 3500 testing images, mitigate the risk of overfitting.
with each image dimension being 48x48 pixels. The dataset For evaluation, the model was subjected to the testing
encompasses seven distinct emotions: 'Happy,' 'Sad,' 'Fear,' dataset, and performance metrics were computed. These
'Disgust,' 'Anger,' 'Surprise,' and 'Neutral.' Preprocessing metrics included accuracy, precision, recall, and F1-score,
steps have been performed on the dataset, including which provide insights into the model's ability to accurately
normalization and resizing, to ensure optimal model training. classify different facial expressions. The model was also
Subsequently, a two-step training approach was adopted, evaluated on the ability to generalize, and the test accuracy
which involved training the model on the original metric was reported.
preprocessed dataset and then on an augmented dataset. The software ecosystem utilized for this research
The CNN model architecture was meticulously designed included TensorFlow and its related libraries. The model
to capture intricate features within facial expressions. The was constructed using the Sequential API, while layers such
model consists of multiple convolutional layers, each as Conv2D, Dense, MaxPooling2D, and Dropout were
followed by ReLU activation functions to introduce non- employed to build the neural architecture. The Adam
linearity. Pooling layers, including MaxPooling2D, were optimizer was selected for training, and early stopping was
incorporated to down-sample the spatial dimensions and implemented to halt training if validation loss plateaued.
retain important features. Dropout layers were strategically Regularization in the form of L2 regularization was
added to mitigate overfitting by randomly deactivating incorporated into the dense layers to curb overfitting
neurons during training. Furthermore, dense layers with tendencies.
dropout regularization were appended to the network,
The presented methodology serves as a robust
framework to address the challenges associated with facial
expression detection and lays the groundwork for future
improvements, adaptations, and advancements in this
domain.
IV. RESULT
The developed CNN-based facial expression detection
model underwent comprehensive training and evaluation,
yielding insightful performance metrics. The model's
effectiveness was assessed through a series of experiments,
considering both the original preprocessed dataset and the
augmented dataset. FIGURE III
After training on the original preprocessed dataset, the OUR CNN MODEL ARCHITECTURE
model achieved a training accuracy of approximately
63.90%. The validation accuracy reached 64.28%, reflecting In summary, the results highlight the significance of the
a relatively consistent performance. However, a potential mixed training approach, which involved training the model
concern was observed, as the validation loss (0.9918) was on both the original preprocessed dataset and an augmented
slightly higher than the training loss (0.9587), suggesting a dataset. This strategy contributed to a substantial reduction
potential risk of overfitting. in overfitting tendencies, resulting in improved validation
To address this, the model was subjected to training accuracy and loss. The achieved test accuracy underscores
with the augmented dataset, incorporating diverse the model's proficiency in facial expression detection.
transformations to enhance its ability to generalize. Despite the success, avenues for further enhancement exist,
Following the mixed training approach, the model such as fine-tuning hyperparameters, exploring more
demonstrated improved training performance, with a training sophisticated architectures, and leveraging advanced
accuracy of 76.91%. The validation accuracy exhibited a regularization techniques to achieve even more robust
more substantial increase, reaching 75.72%. Importantly, the performance.
validation loss reduced to 0.7143, indicating a positive The results affirm the feasibility of the proposed
reduction in overfitting tendencies. methodology and pave the way for future research and
advancements in the domain of facial expression detection
using CNN-based models.
V. CONCLUSION
The importance and usefulness of accurately recognizing the
facial expressions of another human without the need of
another human is preceding to be more significant than ever,
and the need of it grows as time progresses and technology
advances. Our model provides a solution to the problem,
FIGURE II while being extremely simple and easy to improve upon.
ACCURACY OF EACH RESPECTIVE EMOTION
We used 10 convolutional layers, with max pooling and
dropout in between, the model was first trained on the
The model's performance was further evaluated on the
testing dataset, demonstrating the generalization capabilities original dataset and then on the generated dataset to avoid
overfitting.as mush as possible. Further, early callback
acquired during training. The test accuracy reached 72.30%,
reaffirming the model's efficacy in accurately classifying helped in the same, and as a result, we were able to achieve
67.34% accuracy on the testing dataset. Although the model
facial expressions on unseen data. Additionally, Confusion
metrics were computed for each emotion class, providing a performed fairly well, the facial expressions in real life are
quite complex and normally cannot simply be put into 7
comprehensive understanding of the model's class-specific
performance. classes. Other than the complex human nature, light and
brightness plays a huge role in the model’s ability to
recognize, even then the captured image may also be
blocking one’s face or may be blurry, which may stop the
model from detecting altogether. We need to continue
working on these problems.
REFERENCES
Place references in a separate References section at the end in Education Conference, Rapid City, SD, pp. F2A-1 – F2A-3.
[3] Dweck, Carol S. 2006. Mindset: The New Psychology of Success,
of the paper. Number the references sequentially by order of
New York: Random House, Inc.
appearance, not alphabetically. List up to three authors’ [4] Kaplan, Avi and Maehr, Martin L. June 2007. “The Contributions and
names in a reference; replace the others by “et al.” Prospects of Goal Orientation Theory.” Educational Psychology
 Reference text: 8 point, Times New Roman, Review 19(2), pp. 141 – 184.
[5] Dweck, Carol S. “Messages That Motivate: How Praise Molds
full justified, hanging .25”, no space between the Students’ Beliefs, Motivation, and Performance (In Surprising
references, tab between right bracket and the start of the Ways).” In Aronson, Joshua (ed.), 2006, Improving Academic
reference Achievement: Impact of Psychological Factors on Education. New
York: Elsevier Science, pp. 37 – 60.
[1] “Today in Science History: Engineering Quotes.” 2012.
todayinsci.com/QuotationsCategories/E_Cat/Engineering- AUTHOR INFORMATION
Quotations.htm. Web. Accessed: April 9, 2012.
[2] Donohue, Susan K. and Richards, Larry G. October 2011. “P-12 An example: Jane Smith, Professor, Department of
Engineering Education: Using Engineering Teaching Kits to Address Electrical and Computer Engineering, Big State University.
Student Misconceptions in Science.” Proceedings of the 41st Frontiers

IEEE Format 1

Uploaded by

Copyright:

Available Formats

IEEE Format 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IEEE Format 1

Uploaded by

Copyright:

Available Formats

Facial Expression Detection

Anish Panda, Priyansh Sachidanand Singh, and Vikas?

You might also like