Aryan - Face Detection - Aryan Masih
Aryan - Face Detection - Aryan Masih
Aryan - Face Detection - Aryan Masih
INTRODUCTION
UNIVERSITY POLYTECHNIC
GREATER NOIDA - 201306 (APRIL – 2021)
PROJECT-2 REPORT
Submitted By
Aryan Masih
2
CHAPTER 1. INTRODUCTION
DECLARATION
Project Title: Face Detection Using Machine Learning. Degree for which the
project work is submitted: diploma in Computer Science and Engineering.
I declare that the presented project represents largely my own ideas and work
in my own words. Where others ideas or words have been included, I have
adequately cited and listed in the reference materials. The report has been
prepared without resorting to plagiarism. I have adhered to all principles of
academic honesty and integrity. No falsified or fabricated data have been
presented in the report. I understand that any violation of the above will cause
for disciplinary action by the Institute, including revoking the conferred degree,
if conferred, and can also evoke penal action from the sources which have not
been properly cited or from whom proper permission has not been taken.
Date: 5/4/2021
Aryan Masih
Admission no. :- 18GPTC4060113
3
CHAPTER 1. INTRODUCTION
CERTIFICATE
It is certified that the work contained in this project entitled Face Detection
Using Machine Learning submitted by Aryan masih, for the degree of
diploma in Computer Science and Engineering is absolutely based on
his/her own work carried out under my supervision and this project work has
not been submitted elsewhere for any degree. been submitted elsewhere for
any degree.
GUIDE
4
CHAPTER 1. INTRODUCTION
ABSTRACT
Machine learning has been gaining momentum over last decades: self-driving cars,
efficient web search, speech and image recognition. The successful results
gradually propagate into our daily live. Machine learning is a class of artificial
intelligence methods, which allows the computer to operate in a self-learning
mode, without being explicitly programmed. It is a very interesting and complex
topic, which could drive the future of technology.
Face detection is an important step in face recognition and emotion recognition,
which is one of the more representative and classic application in computer
vision. Face is one of the physiological bio-metrics based on stable features.
Face detection by computer systems has become a major field of interest. Face
detection algorithms are used in wide range of applications, such as security
control, video retrieving, biometric signal processing, human computer
interface, emotion detection, face recognition and image database
management. Face detection is a challenging mission because faces in the
images are all uncontrolled. E.g. illumination condition, vary pose, different
facial expressions.
5
CHAPTER 1. INTRODUCTION
ACKNOWLEDGEMENT
I wish to express my profound and sincere gratitude to, Ms. Nutan Gusain
(Assistant Professor), of Computer Science Engineering. University Polytechnic,
Galgotias University, Uttar Pradesh, who guided me into the intricacies of this
project non - chalantly with matchless magnanimity.
6
CHAPTER 1. INTRODUCTION
TABLE OF CONTENTS
• Declaration
• Certificate
• Abstract
• Acknowledgement
• List of Figures
• List of Tables
1 Introduction 1
1.2 Purpose . . . . . . . . . . . . . . . . . . . . . 3
2 Literature Survey 5
3 Proposed Model 11
4 Module Split-Up 19
5 Implementation 27
References 39
7
CHAPTER 1. INTRODUCTION
Figures
List of Tables
8
CHAPTER 1. INTRODUCTION
Chapter 1
Introduction
steps:[4]
1. Pre-Processing: To reduce the variability in the faces, the images are processed
before they are fed into the network. All positive examples that is the face images
are obtained by cropping images with frontal faces to include only the front view.
All the cropped images are then corrected for lighting through standard
algorithms.
9
CHAPTER 1. INTRODUCTION
3. Localization: The trained neural network is then used to search for faces in an
image and if present localize them in a bounding box. Various Feature of Face on
which the work has been done.
Face Detection is the most important step of emotion recognition. Not only the
emotion recognition, face detection is also a first step in Human Computer
Interaction (HCI) systems. E.g. expression recognition. Unlike traditional HCI
device, keyboard, mouse and display, it provides more effective methods to
increase user experience with computer used. As a result, it speeds up human’s
work. It conveys information from physical world into logical thinking to control
the computer system. In addition, face detection is one of an object detection
which is used to classify the desired object from the given images/video and locate
it.
License plate detection is one of the examples of the object
1.2. PURPOSE
1.2 Purpose
The aim of this project is to develop and propose a system to detect human faces in
digital images effectively, and recognize their facial expression no matter what
person’s ethnic, pose, etc. Input images may be varied with face size, complex of
background and illumination condition. Face detection is widely use in bio-metric,
photography, etc.
Analysis of facial expression plays fundamental roles for applications which are
based on emotion recognition like Human Computer Interaction (HCI), Social
10
CHAPTER 1. INTRODUCTION
Robot, Animation, Alert System & Pain monitoring for patients, movie
recommendation as per mood, mental state identification, etc.
Faces form a class of fairly similar objects. Each face consists of the same
components in the same geometrical configuration. This is the main reason for the
success of frontal face detection systems. However, the problem of pose invariance
is still unsolved. Detecting faces which are rotated in depth remains a challenging
task.
The motivation behind this project is that facial detection has an amplitude of
possible applications. From common household objects like digital cameras that
automatically focus on human faces to security cameras that actually match a face
to a person’s identity. Webcams are often used as a security measure for locking a
personal computer. With the rapid development of technologies it is required to
build an intelligent system that can understand human emotion. Cameras can also
use this technology to track human faces and keep a count of the number of people
in a shot or in a certain location or even coming in through an entrance. This
technology can be further narrowed down to the recognition and tracking of eyes.
This would save power by dimming a screen if viewer is not looking. For this
project, we hope to use an already existing algorithm as a basis for face detection
and emotion recognition and build upon it to create improvements and explore
more data.
11
Chapter 2
Literature Survey
Face detection is a computer technology that determines the location and size of
human face in arbitrary (digital) image. The facial features are detected and any
other objects like trees, buildings and bodies etc are ignored from the digital
image. It can be regarded as a specific case of object class detection, where the task
is to find the location and sizes of all objects in an image that belong to a given
class. Face detection, can be regarded as a more general case of face localization.
In face localization, the task is to find the locations and sizes of a known number
of faces (usually one). Basically there are two types of approaches to detect facial
part in the given image i.e. feature base and image base approach. Feature base
approach tries to extract features of the image and match it against the knowledge
of the face features. While image base approach tries to get best match between
training and testing images.
12
analyzed objectively and which are the main problems when working with
emotions are. Facial expressions are produced due to face muscle movements
that end up in temporary wrinkles in the face skin and the temporary
deformation or displacement of facial features like eyebrows,
eyelids, nose and mouth. In most cases, facial expression persistence is short in
time; usually no more than a few sec- onds.
Emotion Definition Motion of Facial Part
Anger Anger is one of the most dangerous Eyebrows pulled down,
emotions. This emotion may be Open eye, teeth shut and
harmful so, humans are trying to avoid lips tightened, upper and
this emotion.Secondary emotions of lower lids pulled up.
anger are irritation, annoyance,
frustration, hate and dislike.
Fear Fear is the emotion of danger. It may be Outer eyebrow down, inner
because of danger of physical or eyebrow up, mouth open,
psychological harm. Secondary jaw dropped.
emotions of fear are Horror,
nervousness, panic, worry and dread.
Happiness Happiness is most desired expression Open Eyes, mouth edge up,
by human. Secondary emotions are open mouth, lip corner
cheerfulness, pride, relief, hope, pulled up, cheeks raised,
pleasure, and thrill. and wrinkles around eyes.
13
Surprise This emotion comes when unexpected Eyebrows up, open eye,
things happens. Secondary emotions of mouth open, jaw dropped.
surprise are amazement,
astonishment.
• Image Acquisition
• Image Pre-processing
• Image Segmentation
1. Image Acquisition:
Static image or image sequences are used for facial expression recognition. 2-D
gray scale facial image is most popular for facial image recognition although color
images can convey more information about emotion such as blushing. In future
color images will prefer for the same because of low cost availability of color
image equipment’s. For image acquisition Camera, Cell Phone or other digital
devices are used.
2. Image Pre-processing:
14
3.Image Segmentation:
Segmentation separates image into meaningful reasons.
Segmentation of an image is a method of dividing the image into homogenous, self-
consistent regions corresponding to different objects in the image on the bases of
texture, edge and intensity
4.Features Extraction:
Feature extraction can be considered as “interest” part in image. It includes information
of shape, motion, color, texture of facial image. It extracts the meaningful information
form image. As compared to original image feature extraction significantly reduces the
information of image, which gives advantage in storage.
15
2.2. IMAGE PROCESSING STAGES CHAPTER 2. LITERATURE SURVEY
16
Chapter 3
Proposed Model
17
CHAPTER 3. PROPOSED MODEL
Algorithm. In image processing stage, the facial region is extracted and then facial
components are extracted.
The feature vector extraction method is most important key point in emotion
recognition problem. Especially, it is necessary to get good feature vector to make
better recognition accuracy. In the facial feature extraction stage, we propose a
new feature vector extraction method. The proposed method divide whole image
into three feature region: eye region, mouth region, and auxiliary region. Several
information are extracted from each region: geometric and shape information.
Features description Size
Xe1 Distance between two eye brow. 1x1
.
Xm2 Distance between nose 1x1
and mouth.
Table 3.1 shows the specific features of eye region. The four features represent
geometric information of eye and eye brow. Table 3.2 shows the features in mouth
region. There are two features for geometric information. Since size of facial image
is not static value, we need to normalize the feature vector. In this paper, all
features are normalized by width of facial image.Comparing images is not easy and
18
CHAPTER 3. PROPOSED MODEL
spends much time to compute. To overcome this difficulty, new calculated method
is used to compare facial component image with template. Let Xw, Xh, and Xp are
width, height, and the number of pixel in image. The similarity S can be calculated
as
where Tw, Th, and Tp are width, height, and the number of pixel in template.
The system architecture for face detection is given below:
19
CHAPTER 3. PROPOSED MODEL
20
CHAPTER 3. PROPOSED MODEL
21
CHAPTER 3. PROPOSED MODEL
22
Chapter 4
Module Split-Up
• Face Detection,
• Feature Extraction,
• Emotion Recognition.
For face detection module we used a haar-cascade classifier which easily detects
faces in an image or a video frame. For emotion recognition module we wrote a
python script to train a custom supervised machine learning model using
Tensorflow and Keras that will be able to recognize the emotions of a face. We
used a 5layered Convolution Neural Network, the first layer is the input layer and
the last layer is the output layer.
Neural networks consist of individual units called neurons. Neurons are located in
a series of groups-layers. Neurons in each layer are connected to neurons of the
next layer. Data comes from the input layer to the output layer along these
compounds. Each individual node performs a simple mathematical calculation.
hen it transmits its data to all the nodes it is connected to.[1]
23
CHAPTER 4. MODULE SPLIT-UP
The last wave of neural networks came in connection with the increase in
computing power and the accumulation of experience. That brought Deep
learning, where technological structures of neural networks have become more
complex and able to solve a wide range of tasks that could not be effectively solved
before. Image classification is a prominent example.
Let us consider the use of CNN for image classification in more detail. The main
task of image classification is acceptance of the input image and the following
definition of its class. This is a skill that people learn from their birth and are able
to easily determine that the image in the picture is an elephant. But the computer
sees the pictures quite differently:
Instead of the image, the computer sees an array of pixels. For example, if image size is
300 x 300. In this case, the size of the array will be 300x300x3. Where 300 is width, next
300 is height and 3 is RGB channel values. The computer is assigned a value from 0 to
255 to each of these numbers. his value describes the intensity of the pixel at each point.
To solve this problem the computer looks for the characteristics of the base level. In
human understanding such characteristics are for example the trunk or large ears. For
the computer, these characteristics are boundaries or curvatures. And then through the
groups of convolutional layers the computer constructs more abstract concepts. In more
detail: the image is passed through a series of convolutional, nonlinear, pooling layers
and fully connected layers, and then generates the output.
The Convolution layer is always the first. The image (matrix with pixel values) is
entered into it. Imagine that the reading of the input matrix begins at the top left
of image. Next the software selects a smaller matrix there, which is called a filter
24
CHAPTER 4. MODULE SPLIT-UP
(or neuron, or core). Then the filter produces convolution, i.e. moves along the
input image. The filter’s task is to multiply its values by the original pixel values.
All these multiplications are summed up. One number is obtained in the end. Since
the filter has read the image only in the upper left corner, it moves further and
further right by 1 unit performing a similar operation. After passing the filter
across all positions, a matrix is obtained, but smaller then a input matrix.
Figure 4.2: Procedure of retrieval module and data management module in a largescale
face detection.
25
CHAPTER 4. MODULE SPLIT-UP
26
CHAPTER 4. MODULE SPLIT-UP
27
CHAPTER 4. MODULE SPLIT-UP
28
Chapter 5
Implementation
In face and emotion recognition system we have used the following Libraries:
• cv2
• numpy
• keras
• pandas
• TensorFlow
• imutils
We have downloaded the dataset from Kaggle. The dataset can be easily
downloaded by regitering on the kaggle website [5].The data consists of 48x48
pixel grayscale images of faces. The faces have been automatically registered so
that the face is more or less centered and occupies about the same amount of space
in each image. The task is to categorize each face based on the emotion shown in
the facial expression in to one of seven categories (0=Angry, 1=Disgust, 2=Fear,
3=Happy, 4=Sad, 5=Surprise, 6=Neutral).
First we load and process the dataset. The process.py module first set path for
dataset as ’fer2013/fer2013.csv’ (fer2013.csv is our dataset) and set image size as
(48,48). There are two methods in process.py load fer2013 and preprocess input.
The code snippet for load fer2013 and preprocess input is below:
def load_fer2013():
data = pd.read_csv(dataset_path) pixels
= data[’pixels’].tolist() width, height = 48, 48 faces
= [] for pixel_sequence in pixels:
face = [int(pixel) for pixel in pixel_sequence.split(’ ’)] face =
np.asarray(face).reshape(width, height) face =
cv2.resize(face.astype(’uint8’),image_size)
faces.append(face.astype(’float32’))
faces = np.asarray(faces) faces = np.expand_dims(faces, -
1) emotions = pd.get_dummies(data[’emotion’]).as_matrix()
return faces, emotions
29
CHAPTER 5. IMPLEMENTATION
* 2.0 return x
And then we train the dataset for emotion recognition. The number of epochs used
are 106. Epochs are used to train the dataset for 106 times.
Below are the snippets of the system while training the dataset:
30
CHAPTER 5. IMPLEMENTATION
The train.py module is used to train the dataset. We used a 5 layered
Convolution Neural Network. The first layer is the input layer to the Neural
Network and then there are 3 hidden layers and the the output layer. The code
snippet for layers of CNN is below:
31
CHAPTER 5. IMPLEMENTATION
MaxPooling2D((3, 3), strides=(2, 2), padding=’same’)(x) x =
layers.add([x, residual])
The next module is video.py. This module is used to detect face and emotion in a
video frame. First we used the haarcascade classifier for face detection in the video
frame and then the detected face undergoes as an input for emotion recognition
module. The video frame first converted to the gray and then the image is scaled
and the target face is detected in a rectangle around the face.
Then we load the trained model and create a list of target faces. The target faces
are angry ,disgust, scared, happy, sad, surprised, neutral. The trained model is then
predict the emotion of the faces in a video frame or in an image. The code snippet
for video capture, loading of Haarcascade classifier and keras model, and target
set of emotion list are as follows:
face_detection = cv2.CascadeClassifier(detection_model_path)
emotion_classifier = load_model(emotion_model_path, compile=False)
EMOTIONS = ["angry" ,"disgust","scared", "happy",
"sad", "surprised", "neutral"]
# starting video streaming cv2.namedWindow(’your_face’) camera =
cv2.VideoCapture(0)
32
Chapter 6
The dataset is trained on a low computing power machine. The accuracy achieved
in 4 Epochs is 48.33%. Accuracy can be increased to 70% by increasing the
number of epochs to 100 on high computing power machine.
The face detection and emotion recognition system works
CHAPTER 6. RESULTS AND DISCUSSIONS
Better under bright lightning condition and good quality web camera. The system
is able to detect sad,
33
angry, fear, happy and neutral faces accurately in a video frame and fear, happy,
angry, disgust, scared, surprised and neutral in an image file.
The images for testing are downloaded from the Shutter stock (A website for free stock
images) [8]. The
Results using an input image are below:
Figure 6.2: Result using input images
34
Chapter 7
In this project of face and emotion detection, I have tried to study on various
techniques on face and emotion recognition. The techniques included Viola-Jones
algorithm which detects the various parts of the face, Histogram-Equalization
which is used to adjust image intensities and Thresholding which is used to create
a binary image from a gray-scale image. Hence using all these techniques, key
points of the face are extracted which supplies all these data as training data set
for classification.
The following two techniques are used for respective mentioned tasks in face recognition
system.
Haar feature-based cascade classifiers: It detects frontal face in an image
well. It is real time and faster in comparison to other face detector. We used an
implementation from Open-CV.
CNN Model: We trained a classification CNN model architecture which takes
bounded face (48*48 pixels) as input and predicts probabilities of 7 emotions in
the output layer. The face detection module is not able to correctly detect tilted
faces and faces with glasses and the emotion recognition module is not able to
detect surprised faces correctly under webcam but works well for input image.
Future areas of development includes detection of tilted faces as well
CHAPTER 7. CONCLUSIONS AND FUTURE WORKS
as faces with glasses and correctly identify disgust and surprised faces in video
stream. And we will try to speed up the system to identify emotion accurately up
to 90% by increasing the hidden layers of the CNN and by using/creating better
dataset.
35
References
[7] Lokesh Singh Monika Dubey. Automatic Emotion Recognition Using Facial
Expression. 2016. url: https://www.irjet.net/archives/V3/i2/IRJETV3I284.pdf.
[8] Stock assets to power your creativity. https://www.shutterstock.com/.
Accessed: 2019-03-24.
36