VCEH B Tech Project Report 1
VCEH B Tech Project Report 1
VCEH B Tech Project Report 1
BACHELOR OF TECHNOLOGY
IN
Submitted by
SUPERVISOR
Dr.G.A.E.Satish Kumar
Professor and Head,Dept. of ECE
March, 2023
Department of Electronics and Communication Engineering
CERTIFICATE
Examiner
We avail this opportunity to express our deep sense of gratitude and heart-
ful thanks to Dr. Teegala Vijender Reddy, Chairman and Sri Teegala
Upender Reddy, Secretary of VCE, for providing a congenial atmosphere to
complete this project successfully.
ii
Abstract
iv
Table of Contents
vi
3.6 Terminologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
CHAPTER 4 Architecture . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Module Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.1 Image Preprocessing . . . . . . . . . . . . . . . . . . . . . . 38
4.2.2 Learnable Multiplication Kernel(LMK) . . . . . . . . . . . 39
4.2.3 Summarization layer . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Classification layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Basic FreNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.5 Block FreNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5.1 Weight-Shared Multiplication Kernel . . . . . . . . . . . . . 45
4.5.2 Block sub-sampling . . . . . . . . . . . . . . . . . . . . . . . 46
CHAPTER 5 Experimental Framework . . . . . . . . . . . . . . . . 47
5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
CHAPTER 6 Conclusions and Future Scope . . . . . . . . . . . . . 54
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.2 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
List of Tables
viii
List of Figures
ix
Abbreviations
Abbreviation Description
Introduction
1
1.1 Background Study
Facial expressions have been an important means of communication for
humans since the beginning of human evolution. However, the study of facial
expressions as a scientific field is relatively recent.
In ancient times, philosophers such as Aristotle and Plato wrote about
the relationship between facial expressions and emotions. In the Middle Ages,
the Italian philosopher and theologian Thomas Aquinas argued that emotions
could be recognized by changes in facial expression.
It wasn’t until the 19th century that scientists began to study facial
expressions in a systematic way. One of the first scientists to understand the
significance of facial expressions in human communication was Charles Darwin.
Darwin suggested that many facial expressions are common across cultures and
are innate rather than taught in his book ”The Expression of the Emotions
in Man and Animals” published in 1872.
In the early 20th century, psychologists such as William James and Carl
Jung continued to study facial expressions and their connection to emotions.
In the 1960s, psychologist Paul Ekman began studying facial expressions in
different cultures and developed a system for categorizing different feelings
based on facial muscle movements.
Today,a significant field of study in disciplines including psychology, neu-
roscience, and computer science is the study of facial expressions.Researchers
have developed sophisticated methods for analyzing and categorizing facial
expressions, and have made significant strides in understanding the neural and
psychological mechanisms underlying facial expressions and their connection to
emotions.
Literature Survey
8
• Y. Tang, X. M. Zhang, and H. Wang proposes a novel approach for
facial expression recognition using Feature of geometric convolution Fu-
sion based on the spread of learning. The authors address the problem
of accurately capturing facial features that are relevant to expression
recognition, which is crucial for developing effective facial expression
recognition systems. The proposed approach combines geometric features
and convolutional features using a learning propagation mechanism, which
improves the discriminative power of the features and enables better fa-
cial expression recognition performance. The authors demonstrate that
their method beats current state-of-the-art techniques by evaluating it
on two benchmark datasets. The paper provides valuable insights into
the challenges of facial expression recognition and proposes a promis-
ing approach for addressing these challenges by fusing geometric and
convolutional features using learning propagation.[2]
• Feiyu Wang, Hang Xing, and An Chen have proposed a fresh technique
for identifying facial expressions based on a constraint cycle reliable gen-
eration model. To boost the quality of the produced pictures and the
model’s identification accuracy, the technique comprises adding a class
constraint condition and a gradient penalty rule. The experimental find-
ings demonstrate that while the improved discriminator network performs
better at classifying and identifying these upgraded facial expression pic-
tures, the improved generation model performs better at learning the
fine texture information of facial emotions.[5]
• The article by J. Lin and Y. Yao, titled ”A fast algorithm for convolu-
tional neural networks using tile-based fast Fourier transforms,” published
in Neural Processing Letters in 2019, presents an algorithm to accelerate
the training of convolutional neural networks (CNNs) using a tile-based
fast Fourier transform (FFT) technique. The authors argue that the
FFT can be efficiently implemented in CNNs by breaking the input data
into small tiles and applying the FFT to each tile separately. They
also propose an efficient method for computing the convolution of two
signals in the Fourier domain. The suggested approach was evaluated
against other quick CNN algorithms using multiple benchmark datasets,
and the results show that it achieves comparable accuracy while sig-
nificantly reducing training time. The article is relevant to the field
of facial expression recognition as CNNs are commonly used for this
task, and improving their efficiency can lead to faster and more accurate
recognition systems.[15]
• The paper titled ”A novel triangular DCT feature extraction for enhanced
face recognition” by S. Rao and M. V. B. Rao proposes a new approach
for feature extraction in face recognition using triangular Discrete Cosine
Transform (DCT). The authors address the limitations of traditional
DCT-based methods, such as high storage requirements and inability to
capture finer details in the facial features, by proposing a triangular DCT-
based approach.The proposed approach involves dividing the facial image
into triangular regions and applying the DCT transform to each region
separately. The triangular DCT coefficients obtained from each region
are then concatenated to form the feature vector for the facial image.
Furthermore, the authors also present a method of rating the triangular
sections according to their significance in capturing the face traits. Using
two popular face datasets, LFW and YALE, the suggested method is
assessed and contrasted with conventional DCT-based techniques. In both
datasets, the experimental results show that the suggested methodology
• The paper titled ”Efficient feature extraction using DCT for gender
classification” was presented by A. Goel and V. P. Vishwakarma at
the IEEE International Conference on Recent Trends in Electronics,
Information and Communication Technology (RTEICT) in May 2016. The
authors proposed an approach to extract features using the discrete cosine
transform (DCT) for gender classification.In their study, the authors
employed DCT to extract features from speech signals, which were then
used to classify the gender of the speaker. The authors found that the
proposed approach outperformed In terms of precision and computational
effectiveness, the Mel frequency cepstral coefficients (MFCC) outperform
other feature extraction techniques.The experimental results showed that
the proposed approach achieved an accuracy of 91.5percent in classifying
the gender of speakers, which was higher than the accuracy achieved
by other methods. Additionally, the proposed approach required less
computational resources compared to other methods, making it more
efficient.[18]
Methodology
3.1.1 NumPy
A Python library for computational mathematics is called NumPy. Large,
multidimensional arrays and matrices are supported, along with Several complex
mathematical calculations that can be performed on these arrays. NumPy is
used extensively in scientific computing, data analysis, and machine learning
applications.
One of the key features of NumPy is its support for arrays, which are
similar to lists in Python but have several advantages. NumPy arrays are
more efficient than Python lists, as they are implemented in C and can
be manipulated using fast, low-level operations. NumPy arrays also support
broadcasting, which allows mathematical operations to be applied to arrays of
18
different sizes and shapes.
Many mathematical operations are offered by NumPy, including trigono-
metric, logarithmic, exponential, and statistical operations. These processes
may be carried out with ease on huge datasets using NumPy arrays and these
functions.
Moreover, NumPy supports operations in linear algebra including multiply-
ing matrices and solving systems of linear equations. This makes it a popular
library for scientific computing and machine learning applications.
3.1.2 OpenCV
• graphical user interfaces (GUIs) to display images and video, and interact
with user input.
3.1.3 Pandas
• Missing data handling: Pandas provides tools for handling missing data,
including filling in missing values, dropping missing values, and interpo-
lating missing values.
• Integration with other libraries: Pandas integrates well with other li-
braries for scientific computing, data analysis, and visualization, such as
Matplotlib, Seaborn, and Scikit-learn.
• A set of high-level APIs, such as Keras and Estimators, for building and
training models more easily and efficiently.
• Integration with other libraries and tools for data analysis and visual-
ization, such as Pandas, NumPy, and Matplotlib.
3.2 Deeplearning
Input, hidden, and output layers are the three main types of layers that
make up a neural network. The input layer receives the data input,
which the hidden levels subsequently process. The neural network’s
output is produced by the output layer. Weighted connections connect
every neuron in the hidden layers to other neurons in that layer as well as
neurons in the surrounding layers. For the neural network’s performance
to be optimised, these weights are modified throughout training.
• Gaming: Neural networks are used in gaming for game AI, player
behavior prediction, and game environment generation.
3.6 Terminologies
Researchers and developers can use these AUs to create a feature vec-
tor that captures the key facial movements associated with a particular
emotion. By combining and analyzing the presence or absence of specific
AUs, a facial expression recognition (FER) system can classify an expres-
sion into one of several emotion categories, such as happiness, sadness,
or surprise.
Architecture
One advantage of using FDNNs in FER is that they can be more robust
to variations in lighting and facial orientation, which can affect the
spatial content of the facial image. Additionally, FDNNs can be more
efficient at analyzing signals with complex, time-varying patterns, which
are common in facial expressions.
35
However, there are also some challenges to using FDNNs in FER. One
challenge is that the choice of frequency resolution and windowing function
used in the Fourier transform can have a big impact on how well the
FDNN performs. Additionally, FDNNs may be more computationally
intensive than traditional neural networks due to the need for Fourier
transforms and other frequency-domain operations.
Each module differs in some way from the others. Each deed has a
purpose. This design also includes a data set for training and testing.
The data set, which was collected from Kaggle and is used to train
and test the system, has about 2000 photographs.. Facial expression
recognition (FER) research frequently uses the FER2013 dataset, which
was initially made available for a Kaggle challenge. Anger, disgust,
fear, happiness, neutrality, sadness, and surprise are the seven emotion
categories that the dataset’s 35,887 grayscale images of 48 by 48 pixels
are classified into.
After the pre-processing steps, it convert the image into the frequency
domain using DCT (Discrete Cosine Transform). DCT is a common
technique used to extract frequency features from images. It is similar
to the more widely known Fourier Transform, but it is better suited for
compressing and analyzing image data. By converting the image into the
frequency domain, the authors can extract information about the spatial
frequencies present in the image, which can be useful for recognizing
specific patterns or objects.
Researchers often utilise a sizable face image dataset, like the FER2013
dataset, to train the LMK, and then use a neural network to improve
the filter coefficients. A loss function that gauges the effectiveness of the
facial expression recognition system can be minimised by training the
neural network using a variety of methods, including backpropagation or
genetic algorithms.
pooling in the frequency domain that takes into account the unique
characteristics of this domain. While Pooling is an easy method to
reduce dimensionality in the spatial domain and is widespread in CNNs,
it can lead to information loss when applied directly in the frequency
domain.
The design of the classification layer for FER (facial expression recogni-
tion) using an ANN (artificial neural network). The categorization layer
is constructed using fully linked layers, which receives the result of the
summarization layer as its input. With the dropout technique being
utilised to avoid overfitting, the hidden layers are completely coupled to
the top layer and the output layer.
The input layer and output layer are both completely coupled to the
two hidden layers. As the input data moves through the hidden layers,
the network can learn representations of the data that are progressively
more sophisticated. The dropout strategy, which randomly removes part
of the neurons during training and forces the network to acquire more
robust characteristics, is used to prevent overfitting.
This method is more effective than utilising only manually created features
since , the network is able to learn more intricate representations of the
input data. You can extract the most pertinent features for FER by
learning features from the upper-left DCT coefficients, which are crucial
for understanding a picture.
The BSS is a more advanced technique for dimension reduction that takes
into account the high-frequency components that may be important for
preserving facial details. By using a weight-shared LMK and the BSS,
you are able to extract more detailed and nuanced features from each
local region of the input image.
Experimental Framework
In the network design, fully linked layers come after convolutional lay-
ers. The convolutional layers are intended to learn the most important
frequency filters for recognising facial expressions. The fully connected
layers then use the learned features to classify the facial expressions
47
In conclusion, this experimental setup demonstrates the effectiveness of
using a frequency neural network for recognising facial expressions. By
transforming the image data into the frequency domain, the network is
able to capture the frequency components of the facial features, leading
to improved accuracy and performance.
The dataset was collected from the internet and manually annotated by
human labelers. The distribution of labels is imbalanced, with neutral
being the most frequent emotion, followed by happiness and sadness.
Anger and disgust have the lowest frequencies in the dataset.
In addition to the image data, the FER2013 dataset includes a CSV file
with image labels and their corresponding emotion labels. The labels are
represented as integers ranging from 0 to 6, with 0 signifying neutral, 1
anger, 2 disgust, 2 fear, 3 happiness, 4 sadness, and 5 surprise.
Table 5.1 shows that there are several examples of each expression cat-
egory in the FER2013 database. Disgust and surprise expression data,
however, are far less abundant than those for happy, sadness, and anger.
This is also a result of the subjective objective facts about how people
express their emotions. A significant factor limiting the improvement of
face expression recognition rate is the gravely imbalanced distribution of
data samples. Using the constraint cycle to create a consistent adver-
sary network improves all expression classes. The distribution of facial
expressions in the FER2013 library following the addition of repulsive
expressions is shown in Table 2, along with the quantity of test sets and
training sets. As there aren’t many horrible expressions in the FER2013
data set, new disgusting expressions are produced to supplement the ex-
isting data set, and a few examples are added to the surprise expression
category. The distribution of the final sample is essentially balanced.
Several test set samples weren’t upgraded.
6.1 Conclusions
The FER outcomes show that our algorithms are able to learn frequency-
domain data and forecast facial expression. The parameter analysis and
the ablation research both demonstrate the effectiveness of the indicated
techniques. The comparisons with other state-of-the-art methods show
that our FreNets have short calculation sizes and promising performance.
54
The suggested multiplication layer helps with feature learning on the
upper-left DCT coefficients, which greatly boosts FER’s performance,
according to the comparisons between Multi and DCT. The Basic-FreNet
obtains greater recognition accuracy than Mult, demonstrating that the
summarising layer can improve performance even further. The commonly
used upper-left DCT coefficients have been directly used as classification
input for FER as common handmade features. The comparisons of DCT
and Basic-FreNet show that our recommended techniques can feature
learn on this common handmade feature, and the features that are
created are more beneficial for FER.
[1] Xiaofeng Liu, BVK Vijaya Kumar, Ping Jia, and Jane You. “Hard
negative generation for identity-disentangled facial expression recog-
nition”. In: Pattern Recognition 88 (2019), pp. 1–12.
[2] Yan Tang, Xing Ming Zhang, and Haoxiang Wang. “Geometric-
convolutional feature fusion based on learning propagation for facial
expression recognition”. In: IEEE Access 6 (2018), pp. 42532–42540.
[3] Anima Majumder, Laxmidhar Behera, and Venkatesh K Subrama-
nian. “Automatic facial expression recognition system using deep
network-based data fusion”. In: IEEE transactions on cybernetics
48.1 (2016), pp. 103–114.
[4] Junkai Chen, Zenghai Chen, Zheru Chi, and Hong Fu. “Facial
expression recognition in video with multiple feature fusion”. In:
IEEE Transactions on Affective Computing 9.1 (2016), pp. 38–50.
[5] An Chen, Hang Xing, and Feiyu Wang. “A facial expression recog-
nition method using deep convolutional neural networks based on
edge computing”. In: IEEE Access 8 (2020), pp. 49741–49751.
[6] Young-Woon Lee, Ji-Hae Kim, Young-ju Choi, and Byung-Gyu Kim.
“CNN-based approach for visual quality improvement on HEVC”.
In: 2018 IEEE International Conference on Consumer Electronics
(ICCE). IEEE. 2018, pp. 1–3.
[7] Ali Raza Shahid, Shehryar Khan, and Hong Yan. “Human expression
recognition using facial shape based Fourier descriptors fusion”. In:
Twelfth International Conference on machine vision (ICMV 2019).
Vol. 11433. SPIE. 2020, pp. 180–186.
[8] Bendjillali Ridha Ilyas, Beladgham Mohammed, Merit Khaled, Ab-
delmalik Taleb Ahmed, and Alouani Ihsen. “Facial expression recog-
nition based on dwt feature for deep cnn”. In: 2019 6th Interna-
tional Conference on Control, Decision and Information Technologies
(CoDIT). IEEE. 2019, pp. 344–348.
[9] Tong Zhang, Wenming Zheng, Zhen Cui, Yuan Zong, and Yang
Li. “Spatial–temporal recurrent neural network for emotion recog-
nition”. In: IEEE transactions on cybernetics 49.3 (2018), pp. 839–
847.
[10] Xiaofeng Liu, BVK Vijaya Kumar, Yubin Ge, Chao Yang, Jane You,
and Ping Jia. “Normalized face image generation with perceptron
generative adversarial networks”. In: 2018 IEEE 4th International
Conference on Identity, Security, and Behavior Analysis (ISBA).
IEEE. 2018, pp. 1–8.
56
[11] Sepidehsadat Hosseini and Nam Ik Cho. “GF-CapsNet: Using ga-
bor jet and capsule networks for facial age, gender, and expression
recognition”. In: 2019 14th IEEE International Conference on Au-
tomatic Face & Gesture Recognition (FG 2019). IEEE. 2019, pp. 1–
8.
[12] Huiyuan Yang, Umur Ciftci, and Lijun Yin. “Facial expression
recognition by de-expression residue learning”. In: Proceedings of
the IEEE conference on computer vision and pattern recognition.
2018, pp. 2168–2177.
[13] Shervin Minaee, Mehdi Minaei, and Amirali Abdolrashidi. “Deep-
emotion: Facial expression recognition using attentional convolu-
tional network”. In: Sensors 21.9 (2021), p. 3046.
[14] Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, and Yu
Qiao. “Region attention networks for pose and occlusion robust
facial expression recognition”. In: IEEE Transactions on Image
Processing 29 (2020), pp. 4057–4069.
[15] Jinhua Lin and Yu Yao. “A fast algorithm for convolutional neu-
ral networks using tile-based fast fourier transforms”. In: Neural
Processing Letters 50 (2019), pp. 1951–1967.
[16] Said Elaiwat, Mohammed Bennamoun, and Farid Boussaı̈d. “A
spatio-temporal RBM-based model for facial expression recognition”.
In: Pattern Recognition 49 (2016), pp. 152–161.
[17] Shilpashree Rao and MV Bhaskara Rao. “A novel triangular dct
feature extraction for enhanced face recognition”. In: 2016 10th
International Conference on Intelligent Systems and Control (ISCO).
IEEE. 2016, pp. 1–6.
[18] Anjali Goel and Virendra P Vishwakarma. “Efficient feature extrac-
tion using DCT for gender classification”. In: 2016 IEEE Interna-
tional Conference on Recent Trends in Electronics, Information &
Communication Technology (RTEICT). IEEE. 2016, pp. 1925–1928.