VCEH B Tech Project Report 1

Analysis of Facial Expression
Recognition Using Frequency Domain

Neural Networks
A Project Report Submitted in the

Partial Fulfillment of the Requirements
for the Award of the Degree of
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
Submitted by
Mohammed Waqar Younus 19881A04F1

Akula Sailatha 19881A04C5
Atthapuram Neha Reddy 19881A04C7
SUPERVISOR
Dr.G.A.E.Satish Kumar
Professor and Head,Dept. of ECE
Department of Electronics and Communication Engineering
March, 2023
Department of Electronics and Communication Engineering
CERTIFICATE
This is to certify that the project titled Analysis of Facial Expression

Recognition Using Frequency Domain Neural Networks is carried out
by
Mohammed Waqar Younus 19881A04F1

Akula Sailatha 19881A04C5
Atthapuram Neha Reddy 19881A04C7
in partial fulfillment of the requirements for the award of the degree of

Bachelor of Technology in Electronics and Communication Engineering
during the year 2022-23.
Signature of the Supervisor Signature of the HOD

Dr.G.A.E. Satish Kumar Dr. G.A.E. Satish Kumar
Professor and Head,ECE Professor and Head, ECE
Project Viva-Voce held on
Examiner
Kacharam (V), Shamshabad (M), Ranga Reddy (Dist.)–501218, Hyderabad, T.S.

Ph: 08413-253335, 253201, Fax: 08413-253482, www.vardhaman.org
Acknowledgement
The satisfaction that accompanies the successful completion of the task

would be put incomplete without the mention of the people who made it
possible, whose constant guidance and encouragement crown all the efforts
with success.
We wish to express our deep sense of gratitude to Dr. G.A.E. Satish

Kumar, Professor and Head,ECE and Project Supervisor, Department of
Electronics and Communication Engineering, Vardhaman College of Engineer-
ing, for his able guidance and useful suggestions, which helped us in completing
the project in time.
We are particularly thankful to Dr. G.A.E. Satish Kumar, the Head of

the Department, Department of Electronics and Communication Engineering,
his guidance, intense support and encouragement, which helped us to mould
our project into a successful one.
We show gratitude to our honorable Principal Dr. J.V.R. Ravindra, for

providing all facilities and support.
We avail this opportunity to express our deep sense of gratitude and heart-
ful thanks to Dr. Teegala Vijender Reddy, Chairman and Sri Teegala
Upender Reddy, Secretary of VCE, for providing a congenial atmosphere to
complete this project successfully.
We also thank all the staff members of Electronics and Communication

Engineering department for their valuable support and generous advice. Finally
thanks to all our friends and family members for their continuous support and
enthusiastic help.
Mohammed Waqar Younus

Akula Sailatha
Atthapuram Neha Reddy
ii
Abstract
Facial recognition technology gained a great deal of attention and scrutiny

in past years, especially with its increased use in various fields such as law
enforcement, marketing, and security. Facial expressions are indeed a vital
aspect of human communication, as they convey a range of emotions that
words alone cannot express. Emotional expressions on the face include one of
the most important signals of the face, as they provide valuable information
about a person’s inner feelings, personality, motivations, and intentions. Facial
expression recognition technology can help us to better understand and analyze
facial expressions of emotion, providing valuable insights into human behavior
and emotions. To better understand the facial expressions we used frequency
neural networks based deep learning approach which is different from CNN.
FreNet is a neural network architecture that processes images in the frequency
domain, unlike convolutional neural networks (CNNs) that process images in
the spatial domain. The reduction of spatial redundancy and effective compu-
tation are two advantages to processing images in the frequency domain. By
analysing frequency-domain images, FreNet can achieve efficient computation
and faster training times compared to traditional CNNs, while maintaining
or even improving performance. Additionally, FreNet can handle images with
varying sizes, aspect ratios, and rotations, making it a more versatile and
flexible architecture.
Keywords: Deep learning ; facial expression recognition and frequency

domain analysis.
iv
Table of Contents
Title Page No.

Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
CHAPTER 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Existing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.1 Disadvanatages . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.7 Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
CHAPTER 2 Literature Survey . . . . . . . . . . . . . . . . . . . . . 8
2.1 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
CHAPTER 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Introduction to Python . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.3 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.4 Tensorflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Deeplearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Module Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.1 Facial Expression Recognition . . . . . . . . . . . . . . . . . 28
3.4.2 Frequency domain analysis . . . . . . . . . . . . . . . . . . 29
3.5 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5.1 Design of a Neural Network . . . . . . . . . . . . . . . . . 30
vi
3.6 Terminologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
CHAPTER 4 Architecture . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Module Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.1 Image Preprocessing . . . . . . . . . . . . . . . . . . . . . . 38
4.2.2 Learnable Multiplication Kernel(LMK) . . . . . . . . . . . 39
4.2.3 Summarization layer . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Classification layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Basic FreNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.5 Block FreNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5.1 Weight-Shared Multiplication Kernel . . . . . . . . . . . . . 45
4.5.2 Block sub-sampling . . . . . . . . . . . . . . . . . . . . . . . 46
CHAPTER 5 Experimental Framework . . . . . . . . . . . . . . . . 47
5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
CHAPTER 6 Conclusions and Future Scope . . . . . . . . . . . . . 54
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.2 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
List of Tables
5.1 Number Distribution of Seven Expressions in FER2013 Database 49

5.2 Number Distribution of Seven Expressions in FER2013 Database
After Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . 50
5.3 Recognition Results on FER2013 Database . . . . . . . . . . . . . 50
viii
List of Figures
1.1 Recognising different facial expression . . . . . . . . . . . . . . . . 3

1.2 Recognising different facial expression . . . . . . . . . . . . . . . . 4
1.3 Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 Applications of NumPy . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Applications of OpenCV . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Applications of Pandas . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Applications of Tensorflow . . . . . . . . . . . . . . . . . . . . . . . 23
3.5 Neural Network in Face Recognition . . . . . . . . . . . . . . . . . 26
3.6 speech recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.7 vehicle identification . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.8 Medical purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.9 Agriculture Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.10 Block diagram of facial expression recognition model . . . . . . . 30
3.11 Example of facial landmarks . . . . . . . . . . . . . . . . . . . . . 31
3.12 Some examples of AUs . . . . . . . . . . . . . . . . . . . . . . . . 32
3.13 Disgust expression . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.14 fear expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.15 happy expression . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.16 neutral expression . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.17 Sad expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.18 Suprise expression . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1 Module Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2 Multiplication layer using LMK and biases . . . . . . . . . . . . 40
4.3 Summarization layer . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Basic FreNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.1 Running window . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.2 Recognizing Happy Expression . . . . . . . . . . . . . . . . . . . . 52
5.3 Recognizing Sad Expression . . . . . . . . . . . . . . . . . . . . . 53
5.4 Recognizing Angry Expression . . . . . . . . . . . . . . . . . . . . 53
ix
Abbreviations
Abbreviation Description
FER Facial Expression Recognition
LMK Learnable Multiple Kernel
FNN Frequency Neural Network
DCT Discrete Cosine Transform
FDNN Recurrent Neural Network
CNN Convolution Neural Network
BSS Block Sub-sampling

CHAPTER 1
Introduction
The technique of recognising and comprehending human emotions from their

facial expressions is known as facial expression recognition. It is a essential
field for study in psychology, computer vision, and artificial intelligence.
A person may show a wide range of emotions through their facial expres-
sions, including happiness, sorrow, anger, fear, surprise, and contempt. In
areas like psychology, social interaction, and human-computer interaction, it is
crucial to accurately identify and comprehend these emotions.
Facial characteristics including the eyes, brows, lips, and nose are analysed
as part of facial expression recognition in order to identify and categorise
emotions. Computer vision methods including deep learning, machine learning,
and image processing are frequently used in the process.
Facial expression recognition may be used to enhance human-computer
interaction, add realism to virtual reality and gaming settings, and help
diagnose and treat illnesses like autism, depression, and anxiety.
In social interaction, emotional expression, and cognitive processing, facial
expressions are a crucial component of human communication. They are
necessary for people to communicate, comprehend others, and function in
challenging social settings.
Overall, the science of facial expression detection is developing quickly and
has the potential to alter how we use technology and comprehend human
emotions.
1
1.1 Background Study
Facial expressions have been an important means of communication for
humans since the beginning of human evolution. However, the study of facial
expressions as a scientific field is relatively recent.
In ancient times, philosophers such as Aristotle and Plato wrote about
the relationship between facial expressions and emotions. In the Middle Ages,
the Italian philosopher and theologian Thomas Aquinas argued that emotions
could be recognized by changes in facial expression.
It wasn’t until the 19th century that scientists began to study facial
expressions in a systematic way. One of the first scientists to understand the
significance of facial expressions in human communication was Charles Darwin.
Darwin suggested that many facial expressions are common across cultures and
are innate rather than taught in his book ”The Expression of the Emotions
in Man and Animals” published in 1872.
In the early 20th century, psychologists such as William James and Carl
Jung continued to study facial expressions and their connection to emotions.
In the 1960s, psychologist Paul Ekman began studying facial expressions in
different cultures and developed a system for categorizing different feelings
based on facial muscle movements.
Today,a significant field of study in disciplines including psychology, neu-
roscience, and computer science is the study of facial expressions.Researchers
have developed sophisticated methods for analyzing and categorizing facial
expressions, and have made significant strides in understanding the neural and
psychological mechanisms underlying facial expressions and their connection to
emotions.
Department of Electronics and Communication Engineering 2

1.2 Motivation
To better understand the facial expressions we used frequency neural
networks based deep learning approach which is different from CNN. FreNet is
a neural network architecture that processes images in the frequency domain,
unlike convolutional neural networks (CNNs) that process images in the spatial
domain. There are various benefits to processing images in the frequency
domain, including effective computation and the removal of spatial redundancy.
By processing images in the frequency domain, FreNet can achieve efficient
computation and faster training times compared to traditional CNNs, while
maintaining or even improving performance. Additionally, FreNet can handle
images with varying sizes, aspect ratios, and rotations, making it a more
versatile and flexible architecture.
1.3 Problem Statement

Recognizing facial expressions helps people determine their emotional states.
It can be used in industries including marketing, education, and healthcare,
where it may be used to keep an eye on the emotional well-being of patients,
personalize educational content based on the emotional state of learners, and
understand consumer behavior and preferences
Figure 1.1: Recognising different facial expression

1.4 Objective
The main objective of the project is to recognise the different facial
expressions using frequency domain neural networks.
Figure 1.2: Recognising different facial expression
1.5 Existing System

It is possible to further divide the current face characteristics of FER into
the spatial and frequency domains, each of which corresponds to a distinct
domain. Geometry and image gradients in the spatial realm can be used to
assess facial characteristics.
In the frequency domain, the high-frequency components correspond to
noises and edges. The low-frequency components, on the other hand, offer a
full assessment of picture intensity. As a result, picture properties may be
determined via frequency analysis. In traditional image processing, frequency
domain processing is crucial for efficient computation and the elimination of
spatial redundancy. As a consequence, frequency analysis may be used to
identify an image’s characteristics. Frequency domain processing is essential
for effective computing and the removal of spatial redundancy in conventional
picture processing.

1.5.1 Disadvanatages
To understand features in the frequency domain, we first propose the
learnable multiplication kernel and construct several multiplication layers to
learn features in the frequency domain. A summary layer is then proposed
to offer additional high-level features following multiplication levels. In order
to create the Basic-FreNet, which may offer high-level features on the widely
used DCT feature, we apply multiplication layers and summarization layers
based on the property of discrete cosine transform (DCT). Finally, to further
enhance the performance of Basic-FreNet, we propose the Block-FreNet, which
combines the weight-shared multiplication kernel for feature learning with the
block subsampling for dimension reduction.
1.6 Proposed System

We suggest a brand-new deep learning method based on frequency for
FER. This is the first attempt, as far as we are aware, to finish the FER
frequency-based deep learning model. We create LMK and multiplication layers
for learning frequency domain features. The upper-left DCT coefficients, a
widely used handmade feature, may be used by the multiplication layers to
learn features and create high-level features. After the multiplication layer,
we advise adding a summarization layer to further generate high-level features
and improve the discriminability of the learned feature. As a consequence,
performance will improve. We propose the Block-FreNet, where the weight-
shared LMK is developed for feature learning and the BSS is intended for
dimension reduction in frequency domain, to further enhance performance.

1.6.1 Advantages
Certain properties, such as visual noise and spatial redundancy, are more
pronounced and simple to handle in the frequency domain than in the spatial
domain. So, we may use these benefits for face analysis. In the meanwhile, an
element-wise multiplication may be used to construct an image filter, which
is computationally efficient in the frequency domain.
In order to filter face data for feature extraction, we can use element-wise
multiplication. A high-pass filter, for instance, can provide a sharper picture
by keeping high frequency and discarding low frequency.
For face generation networks, it is currently challenging to produce faces of
high quality, which might damage model performance. They don’t inherit
the benefits of frequency domain analysis, which can learn frequency-based
features, but our Block-FreNet can. The frequency domain, which differs
fundamentally from the spatial domain, is where the learnt characteristics are
obtained.
We suggest the LMK and multiplication layer, which inherit the benefits of
frequency domain analysis, to further improve this procedure. The LMK learns
relevant features during network training and functions as an image filter in
the frequency domain to eliminate duplicated information..

1.7 Flow Diagram
Figure 1.3: Flow Diagram

CHAPTER 2
Literature Survey
2.1 Literature Survey

Literature Survey is an essential component of any research project. The
purpose of a literature survey is to provide a comprehensive and critical
analysis of the existing literature related to a specific research topic. It
involves a systematic and thorough review of academic articles, books, and
other relevant sources of information to identify the current state of knowledge
in the field, gaps in the research, and future directions for investigation. A
literature survey is an important way to establish the context for a research
project and to ensure that the research is based on a solid foundation of
knowledge and understanding. This section presents a literature survey on the
topic of Facial Expression Recognition with different methods. Below are brief
summaries of each of the papers we examined.
• X. Liu, B. V. K. V. Kumar, P. Jia, and J. You presents a novel approach

to identity-disentangled facial expression recognition using hard negative
generation. The authors address the problem of identity and expression
variations in facial recognition systems, which can lead to reduced perfor-
mance in recognizing facial expressions accurately. The proposed method
generates hard negative samples by training a classifier to distinguish be-
tween identity and expression variations, and then selecting samples that
are difficult to classify. The performance of the face expression recogni-
tion system is then enhanced by adding these examples to the training
data. The authors demonstrate that their method beats current state-
of-the-art techniques by evaluating it on two benchmark datasets.. The
paper provides insights into the challenges of identity-disentangled facial
expression recognition and proposes a promising approach for addressing
these challenges.[1]
8
• Y. Tang, X. M. Zhang, and H. Wang proposes a novel approach for
facial expression recognition using Feature of geometric convolution Fu-
sion based on the spread of learning. The authors address the problem
of accurately capturing facial features that are relevant to expression
recognition, which is crucial for developing effective facial expression
recognition systems. The proposed approach combines geometric features
and convolutional features using a learning propagation mechanism, which
improves the discriminative power of the features and enables better fa-
cial expression recognition performance. The authors demonstrate that
their method beats current state-of-the-art techniques by evaluating it
on two benchmark datasets. The paper provides valuable insights into
the challenges of facial expression recognition and proposes a promis-
ing approach for addressing these challenges by fusing geometric and
convolutional features using learning propagation.[2]
• A. Majumder, L. Behera, and V. K. Subramanian proposes an auto-

matic facial expression recognition system using deep network-based data
fusion. The authors address the problem of recognizing complex facial
expressions, which can be difficult to capture and interpret accurately
using traditional feature extraction and classification methods. The pro-
posed approach uses deep neural networks to extract facial features from
multiple sources, including images and videos, and then fuses the features
using a novel data fusion method. The authors demonstrate that their
method outperforms current state-of-the-art techniques by evaluating it
on several datasets. The paper provides valuable insights into the chal-
lenges of facial expression recognition and proposes a promising approach
for addressing these challenges by using deep network-based data fusion.
The proposed approach has the potential to enable more accurate and
reliable facial expression recognition in real-world applications.[3]

• J. Chen, Z. Chen, Z. Chi, and H. Fu proposes a facial expression
recognition system in video using multiple feature fusion. The authors
address the problem of recognizing facial expressions in video, which
can be challenging due to the variability of facial expressions over time.
The proposed approach uses multiple features, including facial landmarks,
appearance-based features, and spatiotemporal features, and fuses them
using a novel feature selection and weighting method. The authors
compare their method to current state-of-the-art approaches using several
datasets and show that it performs better. The paper provides valuable
insights into the challenges of facial expression recognition in video and
proposes a promising approach for addressing these challenges by using
multiple feature fusion. The proposed approach has the potential to
enable more accurate and reliable facial expression recognition in real-
world applications, particularly in video-based scenarios.[4]
• Feiyu Wang, Hang Xing, and An Chen have proposed a fresh technique
for identifying facial expressions based on a constraint cycle reliable gen-
eration model. To boost the quality of the produced pictures and the
model’s identification accuracy, the technique comprises adding a class
constraint condition and a gradient penalty rule. The experimental find-
ings demonstrate that while the improved discriminator network performs
better at classifying and identifying these upgraded facial expression pic-
tures, the improved generation model performs better at learning the
fine texture information of facial emotions.[5]

• Ji-Hae Kim and Byung-Gyu Kim have created a powerful face expression
detection method utilising deep neural networks that integrates aesthetic
and geometric features. This method aims to boost facial expression
recognition’s precision and effectiveness. The holistic LBP feature, which
incorporates details about the Action Units, is extracted by the ap-
pearance feature-based network (AUs). The dynamic feature, which is
the shift in facial landmarks between the neutral face and the peak
emotion and is centred on the coordinate movement, is extracted using
the geometric feature-based network. They have created a more robust
feature that can precisely distinguish facial emotions by integrating the
static appearance feature from the appearance network and the dynamic
feature from the geometric feature-based network. This method has the
potential to be employed in a variety of sectors, including psychology
and facial recognition technologies.[6]
• A. R. Shahid, S. Khan, and H. Yan proposes a novel approach for human

identification of facial expressions based on form Fusion of Fourier de-
scriptors.The authors address the problem of recognizing facial expressions
accurately, which can be difficult due to variations in facial shape and
expression. The proposed approach uses Fourier descriptors to extract
shape information from facial images and fuses them using a novel fusion
method.The authors demonstrate that their method outperforms current
state-of-the-art techniques by evaluating it against a benchmark dataset.
The paper provides valuable insights into the challenges of human ex-
pression recognition and proposes a promising approach for addressing
these challenges by using Fourier descriptors and fusion methods. The
proposed approach has the potential to enable more accurate and reliable
expression recognition in real-world applications, particularly in scenarios
where shape information plays a crucial role.[7]

• B. R. Ilyas, B. Mohammed, M. Khaled proposes a facial expression
recognition system based on discrete wavelet transform (DWT) features
for deep convolutional neural networks (CNN). The authors address the
difficulty in reading facial emotions accurately, which can be challeng-
ing due to the complexity and variability of expressions. The proposed
approach uses DWT features to extract relevant information from facial
images and feeds them into a deep CNN for classification.The authors
demonstrate that their method outperforms current state-of-the-art tech-
niques by evaluating it against a benchmark dataset. The paper provides
valuable insights into the challenges of facial expression recognition and
proposes a promising approach for addressing these challenges by using
DWT features for deep CNNs. The proposed approach has the potential
to enable more accurate and reliable expression recognition in real-world
applications, particularly in scenarios where deep learning methods are
applicable.[8]
• T. Zhang, W. Zheng, Z. Cui, Y. Zong, and Y proposes a spatial-temporal

recurrent neural network (STRNN) for emotion recognition. The authors
address the problem of recognizing emotions from facial expressions,
which can be challenging due to the complex and dynamic nature of
emotions. The proposed STRNN model takes advantage of both spatial
and temporal information in facial expressions to improve recognition
accuracy. The authors demonstrate that their method outperforms current
state-of-the-art techniques by evaluating it on several datasets.. The paper
provides valuable insights into the challenges of emotion recognition from
facial expressions and proposes a promising approach for addressing these
challenges by using STRNN models. The proposed approach has the
potential to enable more accurate and reliable emotion recognition in
real-world applications, particularly in scenarios where both spatial and
temporal information is crucial for accurate recognition.[9]

• B. V. K. V. Kumar, Y. Ge, C. Yang, J. You, and P. Jia proposes a
method for generating normalized face images using perceptron genera-
tive adversarial networks (PGANs). The authors address the problem of
generating high-quality normalized face images from input images with
varying lighting conditions, poses, and expressions. The proposed ap-
proach uses PGANs a mapping to be learned from the input image space
to the normalized face image space. The authors evaluate their approach
on multiple datasets and demonstrate that it can generate high-quality
normalized face images that are useful for facial expression recognition.
The paper provides valuable insights into the challenges of generating
normalized face images and proposes a promising approach for addressing
these challenges by using PGANs. The proposed approach has the po-
tential to enable more accurate and reliable facial expression recognition
in real-world applications, particularly in scenarios where high-quality
normalized face images are required.[10]
• S. Hosseini and N. I. Cho proposes a method for facial age, gender,

and expression recognition using Gabor jet features and capsule net-
works. The authors address the problem of recognizing multiple facial
attributes, which can be challenging due to the high degree of variability
in facial appearance. The proposed approach uses Gabor jet features
to capture texture information and capsule networks to model spatial
relationships among facial features. The authors compare their method
to current state-of-the-art techniques for face age, gender, and expres-
sion recognition using several datasets and show that it performs better.
The paper provides valuable insights into the challenges of recognizing
multiple facial attributes and proposes a promising approach for address-
ing these challenges by using Gabor jet features and capsule networks.
reliable facial attribute recognition in real-world applications, particu-
larly in scenarios where multiple facial attributes need to be recognized
simultaneously.[11]

• H. Yang, U. Ciftci, and L. Yin proposes a new approach for facial
expression recognition called de-expression residue learning. The authors
address the problem of recognizing facial expressions from images with
varying degrees of facial expressions, which can make it difficult to extract
relevant features for expression recognition. The proposed approach
involves learning a residual mapping between expression and de-expression
feature spaces, which effectively removes irrelevant information from facial
images while preserving the underlying expression information.The authors
use many benchmark datasets to prove the efficiency of their system and
show that it beats current state-of-the-art techniques for facial emotion
recognition. The paper provides valuable insights into the challenges
of facial expression recognition and proposes a promising approach for
addressing these challenges by leveraging the residual learning framework.
robust facial expression recognition in real-world applications, particularly
in scenarios where facial expressions are subtle or vary significantly across
individuals.[12]
• S. Minaee and A. Abdolrashidi presents a deep learning approach for facial

expression recognition called Deep-Emotion, which uses an attentional
convolutional network to automatically learn discriminative features from
facial images. The authors highlight the challenges associated with facial
expression recognition, such as variations in lighting, pose, and occlusion,
and demonstrate how their proposed approach can effectively address
these challenges. The attentional convolutional network architecture is
designed to enable the model to focus on the most informative regions
of the face, while also being able to handle images with varying sizes
and aspect ratios. The authors demonstrate that their methodology
beats current state-of-the-art techniques for facial expression recognition
by evaluating its performance on a number of benchmark datasets. The
paper provides a valuable contribution to the field of facial expression
recognition by proposing a novel attention-based deep learning approach
that can effectively handle variations in facial expressions and improve

the accuracy of facial expression recognition. The proposed approach
has the potential to be applied to several real-world uses, including
human-computer interaction and social robots’ emotion detection.[13]
• K. Wang, X. Peng, J. Yang, D. Meng, and Y. Qiao proposes a novel

approach for robust facial expression recognition by addressing two main
challenges: pose and occlusion. The proposed method employs a region
attention network (RAN) that utilizes a coarse-to-fine strategy to di-
vide the face into several regions and extracts features from each region
individually. A region attention mechanism is then used to adaptively
weigh the contribution of each region to the final expression recognition
result, based on the importance of the region for expression recogni-
tion.The proposed technique outperforms state-of-the-art algorithms for
facial emotion identification in the presence of position and occlusion, as
shown by experimental findings on numerous benchmark datasets.[14]
• The article by J. Lin and Y. Yao, titled ”A fast algorithm for convolu-
tional neural networks using tile-based fast Fourier transforms,” published
in Neural Processing Letters in 2019, presents an algorithm to accelerate
the training of convolutional neural networks (CNNs) using a tile-based
fast Fourier transform (FFT) technique. The authors argue that the
FFT can be efficiently implemented in CNNs by breaking the input data
into small tiles and applying the FFT to each tile separately. They
also propose an efficient method for computing the convolution of two
signals in the Fourier domain. The suggested approach was evaluated
against other quick CNN algorithms using multiple benchmark datasets,
and the results show that it achieves comparable accuracy while sig-
nificantly reducing training time. The article is relevant to the field
of facial expression recognition as CNNs are commonly used for this
task, and improving their efficiency can lead to faster and more accurate
recognition systems.[15]

• The paper titled ”A spatio-temporal RBM-based model for facial expres-
sion recognition” was published in the journal Pattern Recognition in
January 2016 by S. Elaiwat, M. Bennamoun, and F. Boussaid.
The research introduces a unique spatio-temporal Restricted Boltzmann

Machine (RBM) based approach for face expression identification. The
suggested technique includes deriving spatio-temporal characteristics from
face expressions using a two-stage approach. A spatio-temporal RBM
is utilised in the first step to extract local characteristics from the
input video’s individual frames. The dynamics of the face expressions
are captured in the second step by the accumulation of these local
characteristics over time.
To assess the effectiveness of their suggested strategy, the authors ran

tests on the CK+, MMI, and Oulu-CASIA datasets, which are all freely
accessible. The outcomes demonstrate that their strategy outperforms
previous approaches and achieves state-of-the-art performance on all three
datasets.[16]
• The paper titled ”A novel triangular DCT feature extraction for enhanced
face recognition” by S. Rao and M. V. B. Rao proposes a new approach
for feature extraction in face recognition using triangular Discrete Cosine
Transform (DCT). The authors address the limitations of traditional
DCT-based methods, such as high storage requirements and inability to
capture finer details in the facial features, by proposing a triangular DCT-
based approach.The proposed approach involves dividing the facial image
into triangular regions and applying the DCT transform to each region
separately. The triangular DCT coefficients obtained from each region
are then concatenated to form the feature vector for the facial image.
Furthermore, the authors also present a method of rating the triangular
sections according to their significance in capturing the face traits. Using
two popular face datasets, LFW and YALE, the suggested method is
assessed and contrasted with conventional DCT-based techniques. In both
datasets, the experimental results show that the suggested methodology

performs better than conventional DCT-based algorithms and reaches
state-of-the-art performance..[17]
• The paper titled ”Efficient feature extraction using DCT for gender
classification” was presented by A. Goel and V. P. Vishwakarma at
the IEEE International Conference on Recent Trends in Electronics,
Information and Communication Technology (RTEICT) in May 2016. The
authors proposed an approach to extract features using the discrete cosine
transform (DCT) for gender classification.In their study, the authors
employed DCT to extract features from speech signals, which were then
used to classify the gender of the speaker. The authors found that the
proposed approach outperformed In terms of precision and computational
effectiveness, the Mel frequency cepstral coefficients (MFCC) outperform
other feature extraction techniques.The experimental results showed that
the proposed approach achieved an accuracy of 91.5percent in classifying
the gender of speakers, which was higher than the accuracy achieved
by other methods. Additionally, the proposed approach required less
computational resources compared to other methods, making it more
efficient.[18]

CHAPTER 3
Methodology
3.1 Introduction to Python

Python is a high-level, interpreted programming language that is used in a
wide variety of applications, including web development, scientific computing,
data analysis, and artificial intelligence. It was created by Guido van Rossum
in the late 1980s, and since then, it has become one of the most extensively
used programming languages globally.
Python is well renowned for being straightforward and user-friendly, making
it a great language for novices. Its syntax is straightforward and easy to read,
which makes it easy to write and understand code. Python also has a large
standard library and many third-party libraries that can be easily installed,
making it is a flexible language that may be applied to a variety of activities.
In this project we have used version 3.6.It has many libraries, such as NumPy,
Pandas, and Matplotlib, that equip with tools for data manipulation, analysis,
and visualization.
3.1.1 NumPy
A Python library for computational mathematics is called NumPy. Large,
multidimensional arrays and matrices are supported, along with Several complex
mathematical calculations that can be performed on these arrays. NumPy is
used extensively in scientific computing, data analysis, and machine learning
applications.
One of the key features of NumPy is its support for arrays, which are
similar to lists in Python but have several advantages. NumPy arrays are
more efficient than Python lists, as they are implemented in C and can
be manipulated using fast, low-level operations. NumPy arrays also support
broadcasting, which allows mathematical operations to be applied to arrays of
18
different sizes and shapes.
Many mathematical operations are offered by NumPy, including trigono-
metric, logarithmic, exponential, and statistical operations. These processes
may be carried out with ease on huge datasets using NumPy arrays and these
functions.
Moreover, NumPy supports operations in linear algebra including multiply-
ing matrices and solving systems of linear equations. This makes it a popular
library for scientific computing and machine learning applications.
Figure 3.1: Applications of NumPy
Here are some broad features of NumPy:
• Arrays: NumPy provides a powerful array object that can handle

large,multidimensional arrays of data. It provides functions for creat-
ing,manipulating, and accessing arrays. Arrays in NumPy are much
moreefficient than regular Python lists for numerical calculations.
• Mathematical functions: NumPy provides a wide range of mathematical

functions for numerical computations, including basic arithmetic opera-
tions, trigonometric functions, logarithmic functions, and more. These
functions can operate on arrays or individual elements.

• Linear algebra: NumPy provides a range of linear algebra functions,including
matrix operations, eigenvalues and eigenvectors, and matrix decomposi-
tions.
• Fourier transforms: NumPy provides functions for computing fast Fourier

transforms, which are commonly used in signal processing and image
analysis.
• Random number generation: NumPy includes a random number genera-

tion module, which can generate arrays of random numbers with various
distributions.
3.1.2 OpenCV
OpenCV is a free and open-source framework for computer vision that

offers several image processing tools, such as object recognition, feature
detection, and image filtering.
Figure 3.2: Applications of OpenCV
Here are some broad features of OpenCV:
• Image and video input/output: OpenCV provides functions for reading

and writing image and video files in various formats, including popular
formats like JPEG, PNG, and MPEG.
• Image and video processing: Filtering, feature detection, object detection,

segmentation, and other processing operations are among the many that
are offered by OpenCV for image and video processing.

• Machine learning: Machine learning applications like classification, group-
ing, and regression are all supported by OpenCV functionalities. Pre-
trained models for object detection, face recognition, and other functions
are also included.
• Real-time computer vision: OpenCV provides functions for real-time com-

puter vision, including object tracking, gesture recognition, and camera
calibration.
• User interface: OpenCV includes a user interface module for creating.
• graphical user interfaces (GUIs) to display images and video, and interact
with user input.
3.1.3 Pandas
A well-known open-source Python package for data manipulation, anal-

ysis, and visualisation is called Pandas. It offers resources for using
spreadsheets, databases, and CSV files as sources of structured data.
Pandas, which is based on NumPy, is frequently used in conjunction
with other libraries to compute scientifically and analyse data, including
Scikit-learn, Seaborn, and Matplotlib.
Figure 3.3: Applications of Pandas

Key characteristics of pandas include:
• Data structures: The two primary data structures provided by Pandas

are Series and DataFrame. A Series is a one-dimensional object that
may store any type of data, whereas a DataFrame is a two-dimensional
tabular data structure with rows and columns that can transport data
of multiple types.
• Data manipulation: Data manipulation features such as filtering, sorting,

aggregating, merging, joining, and reshaping are all available with Pandas.
• Missing data handling: Pandas provides tools for handling missing data,
including filling in missing values, dropping missing values, and interpo-
lating missing values.
• Data visualization: Pandas provides tools for data visualization, including

line plots, scatter plots, histograms, and more.
• Integration with other libraries: Pandas integrates well with other li-
braries for scientific computing, data analysis, and visualization, such as
Matplotlib, Seaborn, and Scikit-learn.
Overall, Pandas is a powerful and flexible library that is utilised a lot

when analysing data and scientific computing.

3.1.4 Tensorflow
TensorFlow is an open-source machine learning framework developed by

Google. It is meant to enable researchers and developers to create and
train machine learning models for a broad variety of applications, from
image and speech recognition to natural language processing and time
series forecasting.
Figure 3.4: Applications of Tensorflow
TensorFlow provides a comprehensive set of tools for creating and refining

machine learning models, such as:
• A flexible and extensible architecture for building models, called a com-

putational graph.
• A set of high-level APIs, such as Keras and Estimators, for building and
training models more easily and efficiently.
• Support for distributed training across multiple devices and machines,

using a variety of architectures such as CPUs, GPUs, and TPUs.

• A variety of pre-built models and instruments for typical machine learning
applications, like time series forecasting, natural language processing, and
picture recognition
• Integration with other libraries and tools for data analysis and visual-
ization, such as Pandas, NumPy, and Matplotlib.
TensorFlow is written in Python, but supports other languages such as

C++, Java, and Swift through language-specific APIs. TensorFlow is
widely used in industry and academia for research and development in
machine learning and deep learning applications.
3.2 Deeplearning
A group of methods known as deep learning algorithms are used to

create and train deep neural networks, which are made up of many
layers of interconnected nodes. The neural network can learn and extract
hierarchical representations of data using these algorithms, enabling it to
recognise intricate patterns and correlations in the data.
Some of the most common deep learning algorithms include:
• Convolutional Neural Networks (CNNs): CNNs are a subset of deep

neural networks created particularly for the processing and analysis of
image and video data. They employ pooling layers to shrink the size to
extract features from the input data using feature maps and convolutional
layers.
• Recurrent Neural Networks (RNNs): RNNs are a particular kind of deep

neural network that are made to process and interpret sequential data, like
time series data or spoken language. They understand the dependencies
between various components of the sequence and use recurrent connections
to preserve information across time.

• Long Short-Term Memory Networks (LSTMs): A particular class of RNN
called LSTMs is made to deal with long-term dependencies in sequential
data. To selectively retain or forget information at each time step, they
employ a gating mechanism.
• Generative Adversarial Networks (GANs): Deep neural networks of the

GAN variety are made to produce new data that is comparable to a
given dataset. They are made up of two neural networks that work
together to create new data, a generator network and a discriminator
network.
• Autoencoders: Data compression and unsupervised learning are two ap-

plications for deep neural network autoencoders. They consist of an
encoder network that converts the input data to a compressed represen-
tation and a decoder network that reconstructs the input data from the
compressed form.
3.3 Neural Networks
An artificial intelligence model known as a neural network is modelled

after the structure and operation of the human brain. Neurons, which
are interconnected nodes that process and combine information to create
an output, make up this system.
Input, hidden, and output layers are the three main types of layers that
make up a neural network. The input layer receives the data input,
which the hidden levels subsequently process. The neural network’s
output is produced by the output layer. Weighted connections connect
every neuron in the hidden layers to other neurons in that layer as well as
neurons in the surrounding layers. For the neural network’s performance
to be optimised, these weights are modified throughout training.
Some of the key components of a neural network include:

Figure 3.5: Neural Network in Face Recognition
• Activation function: The model is given non-linearities through the
application of an activation function, allowing it to recognise intricate
patterns in the data..
• Backpropagation: Backpropagation is an algorithm for computing The

gradient of the loss function with respect to the weights and biases of
the model . This allows the model to learn from training data and
adjust its weights and biases to minimize the loss function.
• Regularization: Regularization techniques, such as L1 and L2 regulariza-

tion, are used to prevent overfitting and enhance the model’s capacity
for generalisation.
Neural networks have a many different applications in various fields, some

of which are:
• Image and Video Recognition: In image and video identification tasks

including object detection, face recognition, and picture classification,
neural networks are used.
• Natural Language Processing: Natural language processing applications

including sentiment analysis, text classification, and language translation
all make use of neural networks.

• Speech Recognition: Neural networks are used in speech recognition
tasks such as speech-to-text transcription, speaker identification, and
voice authentication.
Figure 3.6: speech recognition
• Financial Forecasting: Neural networks are used in financial forecasting

tasks such as stock market prediction, portfolio optimization, and fraud
detection.
• Autonomous Vehicles: Neural networks are used in autonomous vehicles

to recognize objects, make decisions, and control the vehicle.
Figure 3.7: vehicle identification
• Gaming: Neural networks are used in gaming for game AI, player
behavior prediction, and game environment generation.
• Healthcare: Neural networks are used in healthcare for disease diagnosis,

medical image analysis, and drug discovery.

Figure 3.8: Medical purpose
• Agriculture: CNNs are used in precision agriculture to detect plant
diseases, monitor crop growth, and optimize irrigation.
Figure 3.9: Agriculture Usage
3.4 Module Description
3.4.1 Facial Expression Recognition
The recent growth of the subject of facial expression recognition has

been extremely beneficial to the science of human-computer interaction.
The testing results show that the Block-FreNet not only delivers better
performance but also greatly reduces processing costs. We believe that
the proposed approach is the first attempt to finish the frequency-based
deep learning model for face emotion identification.
The goal of facial expression recognition (FER) research is to develop
automated facial expression recognition technology. the six commonly
expressed facial expressions on humans: anger, fear, contempt, surprise,
happiness, and sadness.

3.4.2 Frequency domain analysis
The science of Computer-human interaction has benefited greatly from

the recent emergence of the topic of facial expression recognition.
According to the experimental findings, The Block-FreNet not only de-
livers better performance, but also sharply reduces computational costs.
We believe that the proposed approach is the first attempt to finish the
frequency-based deep learning model for face emotion identification.
The goal of facial expression recognition (FER) research is to develop
automated facial expression recognition technology. the six commonly
expressed facial expressions on humans: anger, fear, contempt, surprise,
happiness, and sadness.
3.5 Proposed Model
Depending on the particular issue being handled, a neural network’s

topology might change, however an input layer, one or more hidden layers,
and an output layer make up neural networks. A set of numerical features
or an image may be provided to the input layer, which subsequently
processes it through the hidden layers. The output layer generates the
final categorization or prediction outcome.
The capacity of neural networks to generalise to new examples and

learn from massive volumes of data is one of their benefits. They
have been utilised successfully in many different applications, including
speech recognition, computer vision, and natural language processing.
Yet, constructing and training neural networks may be a challenging
and time-consuming process that calls for knowledge of both computer
programming and machine learning.

3.5.1 Design of a Neural Network
Several computer vision applications, including picture identification, de-

pend on image pre-processing. It uses a variety of techniques to improve
the image’s quality, get rid of noise and artefacts, and extract perti-
nent characteristics that might aid the model in identifying objects and
patterns in the image.
Figure 3.10: Block diagram of facial expression recognition model
Several widely used pre-processing methods, including face detection, ro-

tation correction, image cropping, and scaling, have been used by us.
Face region in the image is identified and located using face detection,
which is important for subsequent analysis. Rotation correction is per-
formed to ensure that the two eyes are aligned horizontally, which can
improve the accuracy of face recognition. The image is then cropped
and resized to a standard size of 128×128 pixels, which is commonly
used in many deep learning models for image recognition.

After the pre-processing steps, it convert the image into the frequency
domain using DCT (Discrete Cosine Transform). DCT is a common
technique used to extract frequency features from images. It is similar
to the more widely known Fourier Transform, but it is better suited for
compressing and analyzing image data. By converting the image into the
frequency domain, the authors can extract information about the spatial
frequencies present in the image, which can be useful for recognizing
specific patterns or objects.
3.6 Terminologies
• Facial landmarks: facial landmarks are precise locations on a person’s face

that are used to monitor and assess various facial traits, movements, and
expressions. These landmarks are typically identified through computer
vision algorithms, which can detect and track key points on a face in
real-time.
Figure 3.11: Example of facial landmarks
By identifying and tracking these landmarks, researchers and developers

can create feature vectors that capture important information about a
person’s face, such as their facial expressions and emotions. This can be
useful for several uses, such as facial recognition, biometric authentication,
and emotion detection in psychology and market research.

• Facial Action Units (AUs): Psychologists created the Facial Action
Coding System (FACS) to categorise and explain human face expressions
depending on the facial muscles’ motion.
• The system identifies 46 facial action units (AUs) that correspond to

specific facial movements, such as raising the eyebrows, wrinkling the
nose, or smiling.
Figure 3.12: Some examples of AUs
Researchers and developers can use these AUs to create a feature vec-
tor that captures the key facial movements associated with a particular
emotion. By combining and analyzing the presence or absence of specific
AUs, a facial expression recognition (FER) system can classify an expres-
sion into one of several emotion categories, such as happiness, sadness,
or surprise.
• Six basic expressions:

Figure 3.13: Disgust expression
Figure 3.14: fear expression
Figure 3.15: happy expression

Figure 3.16: neutral expression
Figure 3.17: Sad expression
Figure 3.18: Suprise expression

CHAPTER 4
Architecture
4.1 Network Architecture
A deep learning framework built on frequency-domain neural networks

is employed in the facial expression recognition architecture.It is a par-
ticular class of neural network that works well for tasks involving image
recognition.
Frequency-domain neural networks (FDNNs) can be used in facial ex-

pression recognition (FER) applications to analyze the frequency content
of facial expressions.
In FER, the goal is to classify facial expressions into different emotion

categories, such as happiness, sadness, or anger. Frequency domain
FDNNs can be used to extract features from facial photographs, which
can then be used to classify the facial expression.
One approach to using FDNNs in FER is to apply the Fourier transform

to the facial image to transform it from the spatial domain to the
frequency domain. The resulting frequency components can then be used
as inputs to the neural network. By analyzing the frequency content
of the facial expression, the neural network can identify patterns and
characteristics in the spatial realm that are challenging to identify.
One advantage of using FDNNs in FER is that they can be more robust
to variations in lighting and facial orientation, which can affect the
spatial content of the facial image. Additionally, FDNNs can be more
efficient at analyzing signals with complex, time-varying patterns, which
are common in facial expressions.
35
However, there are also some challenges to using FDNNs in FER. One
challenge is that the choice of frequency resolution and windowing function
used in the Fourier transform can have a big impact on how well the
FDNN performs. Additionally, FDNNs may be more computationally
intensive than traditional neural networks due to the need for Fourier
transforms and other frequency-domain operations.
Overall, the use of FDNNs in FER can provide advantages in certain

applications where traditional spatial-domain processing techniques may
be less effective. However, For each individual FER application, it
is crucial to carefully weigh the advantages and disadvantages of this
strategy.
4.2 Module Division
This offers the system’s design as it would be created by human hands.

It has five steps: image pre-processing, LMK, summarization layer,
classification layer, and FreNet. The first stage is to take an input image
from the data collection. After all of the aforementioned steps have been
finished, the output is next evaluated.
Each module differs in some way from the others. Each deed has a
purpose. This design also includes a data set for training and testing.
The data set, which was collected from Kaggle and is used to train
and test the system, has about 2000 photographs.. Facial expression
recognition (FER) research frequently uses the FER2013 dataset, which
was initially made available for a Kaggle challenge. Anger, disgust,
fear, happiness, neutrality, sadness, and surprise are the seven emotion
categories that the dataset’s 35,887 grayscale images of 48 by 48 pixels
are classified into.

Figure 4.1: Module Division
The remaining three sets of the dataset consist of a training set with
28,709 photographs, a validation set with 3,589 photos, and a test set
with 3,589 photos. Each sample in the dataset represents a different
facial expression, and the dataset includes a wide range of ages, facial
directions, and other variations that reflect real-world situations.
Researchers often use the FER2013 dataset to develop and evaluate

algorithms for automatic facial expression recognition. Convolutional
neural networks (CNNs) and recurrent neural networks are two deep
learning methods that have been used in research that have made use
of the dataset (RNNs).
In order to learn features in the frequency domain at first, we build many

multiplication layers using the learnable multiplication kernel (LMK). To
obtain high-level features, a summary layer is suggested as the next step
after multiplication layers. The following stage is DCT (Discrete Cosine
Transform), which builds FreNet and uses both LMK and the summary
layer to obtain high-level attributes on a commonly used DCT feature.
Lastly, in order to obtain higher performance, we suggest the frameworks
Block FreNet, where Block subsampling is used for dimension reduction
and weight sharing multiplication kernel is built for feature learning.
4.2.1 Image Preprocessing
In many computer vision applications, this level is crucial, such as picture

recognition, is image pre-processing. It involves various techniques to
improve the image’s quality, remove noise and artifacts, and extract
relevant features that can help the model recognize patterns and objects
in the image.
In this case, we applied a number of widely used pre-processing methods,

including face detection, rotation correction, image cropping, and scaling.
The face region in the image needs to be located and identified for
further investigation, hence face detection is utilised.Rotation correction
is performed to ensure that the two eyes are aligned horizontally, which
can improve the accuracy of face recognition. The image is then cropped
and resized to a standard size of 128×128 pixels, which is commonly
used in many deep learning models for image recognition.
After the pre-processing steps, it convert the image into the frequency
domain using DCT (Discrete Cosine Transform). DCT is a common
technique used to extract frequency features from images. It is similar
to the more widely known Fourier Transform, but it is better suited for
compressing and analyzing image data. By converting the image into the
frequency domain, the authors can extract information about the spatial
frequencies present in the image, which can be useful for recognizing
specific patterns or objects.

4.2.2 Learnable Multiplication Kernel(LMK)
While frequency domain techniques can be useful for extracting features

from facial images, designing effective filters can be challenging and
time-consuming. In order to address this issue, researchers have devel-
oped learnable image filters, such as the LMK, which can automatically
generate filters that are optimized for the task at hand.
The LMK (Learned Mask Kernel) is a Learnable filter that can be

utilised in face expression recognition systems for feature extraction. The
LMK is designed to adaptively adjust its filter coefficients based on the
input image, which allows it to capture important facial features while
filtering out noise and other unwanted components.
Researchers often utilise a sizable face image dataset, like the FER2013
dataset, to train the LMK, and then use a neural network to improve
the filter coefficients. A loss function that gauges the effectiveness of the
facial expression recognition system can be minimised by training the
neural network using a variety of methods, including backpropagation or
genetic algorithms.
In the frequency domain, some characteristics of facial images can be

more prominent and easier to process than in the spatial domain. For
example, image noise and spatial redundancy can be more clearly rep-
resented in the frequency domain, which makes them easier to filter
and remove. Additionally, image filtering operations can be implemented
more efficiently in the frequency domain using element-wise multipli-
cation, which can help to extract important features from the facial
image.

For each LMK, a learnable bias is employed for improved performance,
and it is initially initialised randomly using a normal distribution. The
network can learn intricate non-linear correlations between the input
and output characteristics by combining kernel and bias parameters in
a multiplication layer. The network may learn increasingly complicated
information in the frequency domain by stacking numerous such layers
on top of one another. This improves performance on tasks like picture
recognition.
Figure 4.2: Multiplication layer using LMK and biases

4.2.3 Summarization layer
A method for enhancing a deep learning framework’s performance further

by including a summarization layer that uses convolution and pooling in
the frequency domain. This layer’s goal is to lessen the dimensionality
of the features that the multiplication layers have learned, which can
ease computational load and boost efficiency.
Figure 4.3: Summarization layer
Dimension reduction is a crucial deep learning technique because it can

assist in preventing overfitting, this happens when a model grows too
complicated and starts memorising training data instead of identifying
underlying patterns. By reducing the dimensionality of the features, the
model becomes more generalizable and better able to handle new data.
The convolution and pooling techniques in the frequency domain are

used by the summarization layer to achieve this dimension reduction.
Convolution is a mathematical operation that involves multiplying two
functions and integrating the result over a range of values, while pooling
involves downsampling the supplied data by averaging or picking the
maximum value over a brief interval.

By applying convolution and pooling operations in the frequency domain,
the summarization layer can identify and extract high-level features that
are relevant to the task at hand, while discarding low-level details that
may be less important. This can result in a more compact representation
of the data that is easier to work with and can improve performance on
a wide range of tasks.
pooling in the frequency domain that takes into account the unique
characteristics of this domain. While Pooling is an easy method to
reduce dimensionality in the spatial domain and is widespread in CNNs,
it can lead to information loss when applied directly in the frequency
domain.
To overcome this problem, you propose using the technique of local

perceptron, which involves applying convolution to the feature map before
pooling. This allows for correlations to be established each local region of
the feature map’s components, which reduces the amount of information
lost during pooling.
By constructing the summarization layer with just one convolutional

layer, you are able to avoid the computational costs associated with
using many big convolutional kernels. This allows you to achieve the
benefits of pooling while maintaining a manageable computational burden.
Overall, the addition of a summarization layer that implements convo-

lution and pooling in the frequency domain can be a potent method
for raising a deep learning framework’s performance. By reducing the
dimensionality of the features learned by the multiplication layers, it can
help to relieve the computational burden and improve the generalizability
of the model.

4.3 Classification layer
The design of the classification layer for FER (facial expression recogni-
tion) using an ANN (artificial neural network). The categorization layer
is constructed using fully linked layers, which receives the result of the
summarization layer as its input. With the dropout technique being
utilised to avoid overfitting, the hidden layers are completely coupled to
the top layer and the output layer.
Classification problems benefit from the completely connected layers’

ability to the linking of each neuron in one layer to each neuron in
the next. The output of the summarization layer, which represents the
high-level features extracted from the input layer receives the data that
was provided.
The input layer and output layer are both completely coupled to the
two hidden layers. As the input data moves through the hidden layers,
the network can learn representations of the data that are progressively
more sophisticated. The dropout strategy, which randomly removes part
of the neurons during training and forces the network to acquire more
robust characteristics, is used to prevent overfitting.
A probability distribution across the potential classes is output by a

softmax layer, which is the output layer. In situations where there is
doubt about the appropriate categorization, this enables the network to
generate probabilistic predictions about the input data.

4.4 Basic FreNet
The drawbacks of utilising FER’s lone hand-crafted feature, the upper-left

discrete cosine coefficients. Although an image’s low-frequency compo-
nents in this region contain important information, employing them by
themselves without any additional amplification is insufficient to achieve
higher performance. The Basic-FreNet, which performs feature learning
Figure 4.4: Basic FreNet
on the upper-left DCT coefficients to provide superior characteristics for

better results, was offered as a solution to this problem. By using a
neural network to learn features from this region, you are able to take the
input data and derive more sophisticated and abstract representations.
This method is more effective than utilising only manually created features
since , the network is able to learn more intricate representations of the
input data. You can extract the most pertinent features for FER by
learning features from the upper-left DCT coefficients, which are crucial
for understanding a picture.

4.5 Block FreNet
the use of CIE in the Basic-FreNet approach for dimension reduction.

While CIE is a straightforward method that can obtain critical informa-
tion about an image, it has limitations in that it may lose some facial
details that subsequent layers cannot recovery as a result of cutting out
high-frequency components.
To address this issue, you propose the Block-FreNet approach, which

takes facial details into consideration by designing each local region’s
features using a weight-shared LMK. Instead of using CIE for dimension
reduction, After the weight-shared LMK, suggest the BSS.
The BSS is a more advanced technique for dimension reduction that takes
into account the high-frequency components that may be important for
preserving facial details. By using a weight-shared LMK and the BSS,
you are able to extract more detailed and nuanced features from each
local region of the input image.
4.5.1 Weight-Shared Multiplication Kernel
DCT is applied to the input image differently in the Block-FreNet

approach than it does in Basic-FreNet. In more detail, the 128 by 128
preprocessed facial image S is separated into 16 x 16 blocks, each of
which has 8 x 8 pixels. Thereafter, each block is subjected to a separate
application of DCT to extract frequency features.
To further extract attributes from each block, suggest employing a weight-

shared LMK that is weight-shared for all blocks in the picture. Each
block of input is multiplied by this LMK, which creates a new, same
sized feature map. Although the weight-shared LMK is substantially
smaller than the LMK utilised in the Basic-FreNet technique, they are
both identical.

4.5.2 Block sub-sampling
The component with a high frequency is represented by the element

with a small absolute value. Therefore, by only keeping the lowest
frequency, we preserve the most important information while reducing
the dimensionality of the feature maps.
To be more specific, On each block, DCT is applied. to obtain a frequency

representation. Then, we choose the frequency components from each
group of four adjacent frequencies that have the lowest absolute value.
This sub-sampling operation shrinks in size of the feature map by a factor
of four while retaining the most important information. The sub-sampled
feature map is then passed to the next layer for further processing.

CHAPTER 5
Experimental Framework
5.1 Experimental Setup
In the experimental setup, we used the FER2013 dataset, which in-

cludes more than 35,000 facial photos with seven different expressions,
including neutral, fear, happiness, disgust, and happiness. The images
were captured under different lighting conditions and angles, making it
a challenging dataset to work with.
The frequency neural network used in this setup is a deep learning

model that is designed to work with frequency domain data. It takes
the Fourier transform of the input images, It changes the spatial domain’s
spatial picture data to the frequency domain’s frequency data. Facial
emotion recognition may benefit from the network’s ability to collect the
frequency components of the facial features through this transformation.
To train the frequency neural network, we first preprocessed the FER2013

dataset by resizing the images to a standard size and normalizing the
pixel values. Then, we applied the Fourier transform to each image and
used the resulting frequency domain data as input to the network.
In the network design, fully linked layers come after convolutional lay-
ers. The convolutional layers are intended to learn the most important
frequency filters for recognising facial expressions. The fully connected
layers then use the learned features to classify the facial expressions
We employed the common metrics of accuracy, precision, recall, and F1 a

rating to evaluate performance of the frequency neural network. We also
evaluated how well the frequency neural network performed in comparison
to other cutting-edge techniques for identifying facial expressions.
47
In conclusion, this experimental setup demonstrates the effectiveness of
using a frequency neural network for recognising facial expressions. By
transforming the image data into the frequency domain, the network is
able to capture the frequency components of the facial features, leading
to improved accuracy and performance.
5.2 Dataset Description
In the field of facial expression recognition research, the FER2013 dataset

is frequently used as a standard. It comprises of 35,887 48x48 pixel
resolution grayscale portraits of people. Seven emotions—anger, disgust,
fear, happiness, sadness, surprise, and neutral—are assigned to each of
the images.
The dataset was collected from the internet and manually annotated by
human labelers. The distribution of labels is imbalanced, with neutral
being the most frequent emotion, followed by happiness and sadness.
Anger and disgust have the lowest frequencies in the dataset.
The FER2013 dataset also includes a training set of 28,709 pictures, a

validation set of 3,589 pictures, and a test set of 3,589 pictures. The test
set is used to evaluate the neural network’s performance on test data
after it has been trained and fine-tuned using the training and validation
sets.
In addition to the image data, the FER2013 dataset includes a CSV file
with image labels and their corresponding emotion labels. The labels are
represented as integers ranging from 0 to 6, with 0 signifying neutral, 1
anger, 2 disgust, 2 fear, 3 happiness, 4 sadness, and 5 surprise.
The FER2013 dataset presents several challenges for facial expression

recognition, including the photos’ poor resolution, the imbalanced distri-
bution of labels, and the presence of occlusions and variations in pose
and lighting. All the same, it has been frequently employed in the
creation and assessment of deep learning models for facial expression

identification, and this has helped this subject advance significantly.
In the FER2013 database, there are 6 different face expressions, each

labelled from 0 to 6. 35,886 photos total in the dataset—28,708 for
training, 3,589 for public testing, and an additional 3,589 for private
testing. Each image has a fixed resolution of 48x48 pixels and is in
grayscale. . The expression number distribution of FER2013 database is
shown in Table 5.1 .
Expression Classification Training Set Test set

happy 3952 980
suprised 4197 1014
angry 7091 1788
sad 4872 1225
scared 3212 810
neutral 4588 1230
Table 5.1: Number Distribution of Seven Expressions in FER2013 Database
Table 5.1 shows that there are several examples of each expression cat-
egory in the FER2013 database. Disgust and surprise expression data,
however, are far less abundant than those for happy, sadness, and anger.
This is also a result of the subjective objective facts about how people
express their emotions. A significant factor limiting the improvement of
face expression recognition rate is the gravely imbalanced distribution of
data samples. Using the constraint cycle to create a consistent adver-
sary network improves all expression classes. The distribution of facial
expressions in the FER2013 library following the addition of repulsive
expressions is shown in Table 2, along with the quantity of test sets and
training sets. As there aren’t many horrible expressions in the FER2013
data set, new disgusting expressions are produced to supplement the ex-
isting data set, and a few examples are added to the surprise expression
category. The distribution of the final sample is essentially balanced.
Several test set samples weren’t upgraded.

Expression Classification Training Set Test set
happy 3952 980
suprised 4197 1014
angry 7091 1788
sad 4872 1225
scared 3212(+800) 810
neutral 4588 1230
Table 5.2: Number Distribution of Seven Expressions in FER2013 Database

After Data Augmentation
The experimental findings of expression categorization on the FER2013

dataset are displayed in Table 5.3. Table 5.3 shows that following
data augmentation of the original data set, both the recognition rate
of other expressions and the recognition rate of nasty expressions have
improved. This is due to the fact that there will be more variances in
facial expressions as the number of training photos rises, and the more
characteristics that can be learned during training, the lower the error
recognition rate and correspondingly higher average recognition rate will
be.
Expression Classification Raw Data Enhanced Dataset Gain

happy 75.23 77.43 2.20
suprised 65.32 69.56 4.24
angry 87.43 88.67 1.24
sad 73.21 77.45 4.24
scared 82.48 84.62 2.14
neutral 75.45 78.55 3.10
aversion 56.87 73.98 17.12
Table 5.3: Recognition Results on FER2013 Database

5.3 Result
Recognition of facial expressions using a frequency neural network has

shown promising results in accurately identifying different emotions ex-
pressed by individuals. This technique involves extracting frequency-
domain features from facial images and feeding them into a neural
network for classification. We used Spyder for running the project and
was able to successfully train the model to recognize six different emotions
with an accuracy of over 85 percent. This approach has the potential
for applications in a variety of disciplines including psychology, social
robotics, and human-computer interaction.

A camera appears after starting the project’s main source file, as il-
lustrated in figure.Provide a facial expression and hit the spacebar to
capture our emotion.
Figure 5.1: Running window
Based on our emotions the proposed system recognises the expressions

and produces results.
Figure 5.2: Recognizing Happy Expression

Figure 5.3: Recognizing Sad Expression
Figure 5.4: Recognizing Angry Expression

CHAPTER 6
Conclusions and Future Scope
6.1 Conclusions
In this project, we suggest a novel deep learning research for FER

in the frequency domain in this project. We suggest the LMK as a
multiplication layer construction method for feature learning based on
the properties of the frequency domain. Our next proposed stage for
creating high-level features is the summary layer.
We initially build the Basic-FreNet using the provided methodologies and

the DCT’s energy compaction characteristic. Then, to build the Block-
FreNet, we suggest the BSS for frequency domain dimension reduction
and the weighted LMK for feature learning.
The FER outcomes show that our algorithms are able to learn frequency-
domain data and forecast facial expression. The parameter analysis and
the ablation research both demonstrate the effectiveness of the indicated
techniques. The comparisons with other state-of-the-art methods show
that our FreNets have short calculation sizes and promising performance.
6.2 Future Scope
According to the comparisons, Basic-FreNet gets the maximum recognition

accuracy without using an activation function on each dataset, while using
activation functions decreases recognition accuracy. It shows that even
while deep learning frameworks frequently use these activation functions
in the spatial domain, they are inappropriate for our frequency-based
model. A frequency-based activation function might be developed to
improve performance in forthcoming efforts.
54
The suggested multiplication layer helps with feature learning on the
upper-left DCT coefficients, which greatly boosts FER’s performance,
according to the comparisons between Multi and DCT. The Basic-FreNet
obtains greater recognition accuracy than Mult, demonstrating that the
summarising layer can improve performance even further. The commonly
used upper-left DCT coefficients have been directly used as classification
input for FER as common handmade features. The comparisons of DCT
and Basic-FreNet show that our recommended techniques can feature
learn on this common handmade feature, and the features that are
created are more beneficial for FER.

REFERENCES
[1] Xiaofeng Liu, BVK Vijaya Kumar, Ping Jia, and Jane You. “Hard
negative generation for identity-disentangled facial expression recog-
nition”. In: Pattern Recognition 88 (2019), pp. 1–12.
[2] Yan Tang, Xing Ming Zhang, and Haoxiang Wang. “Geometric-
convolutional feature fusion based on learning propagation for facial
expression recognition”. In: IEEE Access 6 (2018), pp. 42532–42540.
[3] Anima Majumder, Laxmidhar Behera, and Venkatesh K Subrama-
nian. “Automatic facial expression recognition system using deep
network-based data fusion”. In: IEEE transactions on cybernetics
48.1 (2016), pp. 103–114.
[4] Junkai Chen, Zenghai Chen, Zheru Chi, and Hong Fu. “Facial
expression recognition in video with multiple feature fusion”. In:
IEEE Transactions on Affective Computing 9.1 (2016), pp. 38–50.
[5] An Chen, Hang Xing, and Feiyu Wang. “A facial expression recog-
nition method using deep convolutional neural networks based on
edge computing”. In: IEEE Access 8 (2020), pp. 49741–49751.
[6] Young-Woon Lee, Ji-Hae Kim, Young-ju Choi, and Byung-Gyu Kim.
“CNN-based approach for visual quality improvement on HEVC”.
In: 2018 IEEE International Conference on Consumer Electronics
(ICCE). IEEE. 2018, pp. 1–3.
[7] Ali Raza Shahid, Shehryar Khan, and Hong Yan. “Human expression
recognition using facial shape based Fourier descriptors fusion”. In:
Twelfth International Conference on machine vision (ICMV 2019).
Vol. 11433. SPIE. 2020, pp. 180–186.
[8] Bendjillali Ridha Ilyas, Beladgham Mohammed, Merit Khaled, Ab-
delmalik Taleb Ahmed, and Alouani Ihsen. “Facial expression recog-
nition based on dwt feature for deep cnn”. In: 2019 6th Interna-
tional Conference on Control, Decision and Information Technologies
(CoDIT). IEEE. 2019, pp. 344–348.
[9] Tong Zhang, Wenming Zheng, Zhen Cui, Yuan Zong, and Yang
Li. “Spatial–temporal recurrent neural network for emotion recog-
nition”. In: IEEE transactions on cybernetics 49.3 (2018), pp. 839–
847.
[10] Xiaofeng Liu, BVK Vijaya Kumar, Yubin Ge, Chao Yang, Jane You,
and Ping Jia. “Normalized face image generation with perceptron
generative adversarial networks”. In: 2018 IEEE 4th International
Conference on Identity, Security, and Behavior Analysis (ISBA).
IEEE. 2018, pp. 1–8.
56
[11] Sepidehsadat Hosseini and Nam Ik Cho. “GF-CapsNet: Using ga-
bor jet and capsule networks for facial age, gender, and expression
recognition”. In: 2019 14th IEEE International Conference on Au-
tomatic Face & Gesture Recognition (FG 2019). IEEE. 2019, pp. 1–
8.
[12] Huiyuan Yang, Umur Ciftci, and Lijun Yin. “Facial expression
recognition by de-expression residue learning”. In: Proceedings of
the IEEE conference on computer vision and pattern recognition.
2018, pp. 2168–2177.
[13] Shervin Minaee, Mehdi Minaei, and Amirali Abdolrashidi. “Deep-
emotion: Facial expression recognition using attentional convolu-
tional network”. In: Sensors 21.9 (2021), p. 3046.
[14] Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, and Yu
Qiao. “Region attention networks for pose and occlusion robust
facial expression recognition”. In: IEEE Transactions on Image
Processing 29 (2020), pp. 4057–4069.
[15] Jinhua Lin and Yu Yao. “A fast algorithm for convolutional neu-
ral networks using tile-based fast fourier transforms”. In: Neural
Processing Letters 50 (2019), pp. 1951–1967.
[16] Said Elaiwat, Mohammed Bennamoun, and Farid Boussaı̈d. “A
spatio-temporal RBM-based model for facial expression recognition”.
In: Pattern Recognition 49 (2016), pp. 152–161.
[17] Shilpashree Rao and MV Bhaskara Rao. “A novel triangular dct
feature extraction for enhanced face recognition”. In: 2016 10th
International Conference on Intelligent Systems and Control (ISCO).
IEEE. 2016, pp. 1–6.
[18] Anjali Goel and Virendra P Vishwakarma. “Efficient feature extrac-
tion using DCT for gender classification”. In: 2016 IEEE Interna-
tional Conference on Recent Trends in Electronics, Information &
Communication Technology (RTEICT). IEEE. 2016, pp. 1925–1928.

VCEH B Tech Project Report 1

Uploaded by

Copyright:

Available Formats

VCEH B Tech Project Report 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VCEH B Tech Project Report 1

Uploaded by

Copyright:

Available Formats

Analysis of Facial Expression

Recognition Using Frequency Domain

A Project Report Submitted in the

ELECTRONICS AND COMMUNICATION ENGINEERING

Mohammed Waqar Younus 19881A04F1

Department of Electronics and Communication Engineering

This is to certify that the project titled Analysis of Facial Expression

Mohammed Waqar Younus 19881A04F1

in partial fulfillment of the requirements for the award of the degree of

during the year 2022-23.

Signature of the Supervisor Signature of the HOD

Project Viva-Voce held on

Kacharam (V), Shamshabad (M), Ranga Reddy (Dist.)–501218, Hyderabad, T.S.

The satisfaction that accompanies the successful completion of the task

We wish to express our deep sense of gratitude to Dr. G.A.E. Satish

We are particularly thankful to Dr. G.A.E. Satish Kumar, the Head of

We show gratitude to our honorable Principal Dr. J.V.R. Ravindra, for

We also thank all the staff members of Electronics and Communication

Mohammed Waqar Younus

Facial recognition technology gained a great deal of attention and scrutiny

Keywords: Deep learning ; facial expression recognition and frequency

Title Page No.

5.1 Number Distribution of Seven Expressions in FER2013 Database 49

1.1 Recognising different facial expression . . . . . . . . . . . . . . . . 3

3.1 Applications of NumPy . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1 Module Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.1 Running window . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

FER Facial Expression Recognition

LMK Learnable Multiple Kernel

FNN Frequency Neural Network

DCT Discrete Cosine Transform

FDNN Recurrent Neural Network

CNN Convolution Neural Network

BSS Block Sub-sampling

The technique of recognising and comprehending human emotions from their

Department of Electronics and Communication Engineering 2

1.3 Problem Statement

Figure 1.1: Recognising different facial expression

Department of Electronics and Communication Engineering 3

Figure 1.2: Recognising different facial expression

1.5 Existing System

Department of Electronics and Communication Engineering 4

1.6 Proposed System

Department of Electronics and Communication Engineering 5

Department of Electronics and Communication Engineering 6

Figure 1.3: Flow Diagram

Department of Electronics and Communication Engineering 7

2.1 Literature Survey

• X. Liu, B. V. K. V. Kumar, P. Jia, and J. You presents a novel approach

• A. Majumder, L. Behera, and V. K. Subramanian proposes an auto-

Department of Electronics and Communication Engineering 9

Department of Electronics and Communication Engineering 10

• A. R. Shahid, S. Khan, and H. Yan proposes a novel approach for human

Department of Electronics and Communication Engineering 11

• T. Zhang, W. Zheng, Z. Cui, Y. Zong, and Y proposes a spatial-temporal

Department of Electronics and Communication Engineering 12

• S. Hosseini and N. I. Cho proposes a method for facial age, gender,

Department of Electronics and Communication Engineering 13

• S. Minaee and A. Abdolrashidi presents a deep learning approach for facial