"Blue Eye Technology": Seminar
"Blue Eye Technology": Seminar
"Blue Eye Technology": Seminar
A
Seminar
submitted
in partial fulfillment
for the award of the Degree of
Bachelor of Technology
in Department of Computer Science and Engineering.
April, 2020
Candidate’s Declaration
I hereby declare that the work, which is being presented in the Seminar, entitled “Blue Eye
Technology” in partial fulfillment for the award of Degree of “Bachelor of Technology” in
Dept. of Computer Science and Engineering with Specialization in Computer Science, and
submitted to the Department of Computer Science and Engineering, JIET College of
Engineering, Rajasthan Technical University is a record of my own WORK carried under the
Guidance of Prof. Jaya Gangwani, Department of Computer Science and Engineering, Jodhpu r
Institute of Engineering &Technology, Jodhpur.
I have not submitted the matter presented in this Seminar anywhere for the award of any other
Degree.
Mohit Soni
Computer Science and Engineering,
JIET, Jodhpur.
Counter Signed by
…………………............
Prof. Jaya Gangwani
Assistant Professor
Dept. of Computer Science and Engineering
Jodhpur Institute of Engineering and Technology
ii
ACKNOWLEDGEMENT
I thank our teachers from [Jodhpur Institution of Engineering Technology] who provided insight
and expertise that greatly helped in the project. I thank Prof. Jaya Gangwani, Astt. Professor,
JIET for assistance in the topic and comments that greatly improved my content. I’m also
immensely grateful to Prof. Mamta Garg, HOD(CSE), JIET for giving me chance and
encouraging me for representing the seminar.
Mohit Soni
Roll no. – 16ejtcs041
Computer Science and Engineering
iii
ABSTRACT
Is it possible to create a computer, which can interact with us as we interact each other? For
example, imagine in a fine morning you walk on to your computer room and switch on your
computer, and then it tells you ―Hey friend, good morning you seem to be a bad mood today.
And then it opens your mail box and shows you some of the mails and tries to cheer you. It
seems to be a fiction, but it will be the life lead by ―BLUE EYES‖ in the very near future.
The basic idea behind this technology is to give the computer the human power. We all have
some perceptual abilities. That is, we can understand each other’s feelings. For example, we can
understand ones emotional state by analyzing his facial expression. If we add these
perceptual abilities of human to computers would enable computers to work together with human
beings as intimate partners. The ―BLUE EYES‖ technology aims at creating computational
machines that have perceptual and sensory ability like those of human beings.
iv
CONTENTS
v
1. INTRODUCTION
Imagine yourself in a world where humans interact with computers. You are sitting in front of
your personal computer that can listen, talk, or even scream aloud. It has the ability to gather
information about you and interact with you through special techniques like facial recognition,
speech recognition, etc. It can even understand your emotions at the touch of the mouse. It
verifies your identity, feels your presents, and starts interacting with you. You ask the computer
to dial to your friend at his office. It realizes the urgency of the situation through the mouse, dials
your friend at his office, and establishes a connection. Human cognition depends primarily on the
ability to perceive, interpret, and integrate audio-visuals and censoring information. Adding
extraordinary perceptual abilities to computers would enable computers to work together with
human beings as intimate partners. Researchers are attempting to add more capabilities to
computers that will allow them to interact like humans, recognize human presents, talk, listen, or
even guess their feelings. The BLUE EYES technology aims at creating computational machines
that have perceptual and sensory ability like those of human beings. It uses non-obtrusive
sensing method, employing most modern video cameras and microphones to identify the user’s
actions through the use of imparted sensory abilities. The machine can understand what a user
wants, where he is looking at, and even realize his physical or emotional states.
1
1. SYSTEM OVERVIEW
In the name of BLUE EYES, Blue in this term stands for Blue tooth (which enables wireless
communication) and eyes because eye movement enables us to obtain a lot of interesting and
information. Blue Eyes system consists of a mobile measuring device and a central and a central
analytical system. The mobile device is integrated with Bluetooth module providing wireless
interface between sensors worn by the operator and the central unit. ID cards assigned to each of
the operators and adequate user profiles on the central unit side provide necessary data
personalization so the system consists of
1.1 Data Acquisition Unit Data: Acquisition Unit is a mobile part of the Blue eyes system. Its
main task is to fetch the physiological data from the sensor and send it to the central system to be
processed. Data Acquisition Unit are to maintain Bluetooth connections to get information from
sensor and sending it.
1.2 Central System Unit: CSU maintains other side of the Blue tooth connection, buffers
incoming sensor data, performs online data analysis records conclusion for further exploration
and provides visualization interface.
Data Data
Analysis Logger
2
SYSTEM OVERVIEW
MC 145483
PCM Codec
Jazz Multisensor
Bluetooth
Module
Atmel 89C52
Beeper Microcontroller
LCD Display
LCD Interface
3
MAX 232
ID Card Keyboard
Interface
4
1.2 The Software
Blue Eyes software’s main task is to look after working operators' physiological condition. To
assure instant reaction on the operators' condition change the software performs real
time buffering of the incoming data, real-time physiological data analysis and alarm
triggering. The Blue Eyes software comprises several functional modules System core
facilitates the transfers flow between other system modules (e.g. transfers raw data from the
Connection Manager to data analyzers, processed data from the data analyzers to GUI
controls, other data analyzers, data logger etc.). The System Core fundamental are single-
producer-multi-consumer thread safe queues. Any number of consumers can register to
receive the data supplied by a producer. Every single consumer can register at any
number of producers, receiving therefore different types of data. Naturally, every consumer
may be a producer for other consumers. This approach enables high system scalability – new
data processing modules (i.e. filters, data analyzers and loggers) can be easily added by simply
registering as a costumer.
Connection Manager is responsible for managing the wireless communication between
the mobile Data Acquisition Units and the central system. The Connection Manager handles:
communication with the CSU
hardware searching for new devices in the covered range
establishing Bluetooth connections
connection authentication
5
incoming data buffering
sending alerts
Data Analysis module performs the analysis of the raw sensor data in order to obtain
information about the operator’s physiological condition. The separately running Data
Analysis module supervises each of the working operators. The module consists of a number of
smaller analyzers extracting different types of information. Each of the analyzers registers at
the appropriate Operator Manager or another analyzer as a data consumer and, acting as a
producer, provides the results of the analysis. The most important analyzers are:
Saccade detector - monitors eye movements in order to determine the level of operator's
visual attention
Pulse rate analyzer - uses blood oxygenation signal to compute operator's pulse rate
Custom analyzers - recognize other behaviors than those which are built-in the system.
The new modules are created using C4.5 decision tree induction algorithm
Visualization module provides a user interface for the supervisors. It enables them to watch
each of the working operator’s physiological condition along with a preview of selected video
source and related sound stream. All the incoming alarm messages are instantly signaled to the
supervisor. The Visualization module can be set in an off- line mode, where all the data
is fetched from the database. Watching all the recorded physiological parameters, alarms,
video and audio data the supervisor is able to reconstruct the course of the selected operator’s
duty.
6
2. EMOTION COMPUTING
Rosalind Picard (1997) describes why emotions are important to the computing
community. There are two aspects of affective computing: giving the computer the ability
to detect emotions and giving the computer the ability to express emotions. Not only are
emotions crucial for rational decision making as Picard describes, but emotion detection is
an important step to an adaptive computer system. An adaptive, smart computer system has been
driving our efforts to detect a person’s emotional state. An important element of incorporating
emotion into computing is for productivity for a computer user. A study (Dryer & Horowitz,
1997) has shown that people with personalities that are similar or complement each other
collaborate well. Dryer (1999) has also shown that people view their computer as having a
personality. For these reasons, it is important to develop computers which can work well
with its user
2.1 Theory
7
2.2 Result
The data for each subject consisted of scores for four physiological assessments [GSA,
GSR, pulse, and skin temperature, for each of the six emotions (anger, disgust, fear,
happiness, sadness, and surprise)] across the five minute baseline and test sessions.
GSA data was sampled 80 times per second, GSR and temperature were reported
approximately 3-4 times per second and pulse was recorded as a beat was detected,
approximately 1 time per second. To account for individual variance in physiology, we
calculated the difference between the baseline and test scores. Scores that differed by more than
one and a half standard deviations from the mean were treated as missing. By this criterion,
twelve score were removed from the analysis. The results show the theory behind the Emotion
mouse work is fundamentally sound. The physiological measurements were correlated to
emotions using a correlation model. The correlation model is derived from a calibration process
in which a baseline attribute-to emotion correlation is rendered based on statistical analysis
of calibration signals generated by users having emotions that are measured or
otherwise known at calibration time.
8
3. TYPE OF EMOTION SENSORS
1 For Hand:
o Emotion Mouse
o Septic Mouse
2 For Eyes:
o Expression Glasses
o Magic Pointing
o Eye Tracking
3 For Voice:
Artificial Intelligence Speech Recognition
9
3.1 HAND
10
3.2 EYE
11
4. ARTIFICIAL INTELLIGENT SPEECH RECOGNITION
It is important to consider the environment in which the speech recognition system has to
work. The grammar used by the speaker and accepted by the system, noise level, noise type,
position of the microphone, and speed and manner of the user’s speech are some factors that may
affect the quality of speech recognition .When you dial the telephone number of a big company,
you are likely to hear the sonorous voice of a cultured lady who responds to your call with
great courtesy saying ―Welcome to company X. Please give me the extension number
you want‖. You pronounce the extension number, your name, and the name of person you
want to contact. If the called person accepts the call, the connection is given quickly. This is
artificial intelligence where an automatic call-handling system is used without employing
any telephone operator.
Artificial intelligence (AI) involves two basic ideas. First, it involves studying the thought
processes of human beings. Second, it deals with representing those processes via
machines (like computers, robots, etc). AI is behavior of a machine, which, if performed
by a human being, would be called intelligent. It makes machines smarter and more useful,
and is less expensive than natural intelligence. Natural language processing (NLP) refers to
artificial intelligence methods of communicating with a computer in a natural language like
English. The main objective of a NLP program is to understand input and initiate action. The
input words are scanned and matched against internally stored known words. Identification of a
key word causes some action to be taken. In this way, one can communicate with the computer
in one’s language. No special commands or computer language are required. There is no need to
enter programs in a special language for creating software.
12
4.2 SPEECH RECOGNITION:
The user speaks to the computer through a microphone, which, in used; a simple system may
contain a minimum of three filters. The more the number of filters used, the higher the
probability of accurate recognition. Presently, switched capacitor digital filters are used
because these can be custom-built in integrated circuit form. These are smaller and cheaper than
active filters using operational amplifiers. The filter output is then fed to the ADC to translate the
analogue signal into digital word. The ADC samples the filter outputs many times a second.
Each sample represents different amplitude of the signal .Evenly spaced vertical lines represent
the amplitude of the audio filter output at the instant of sampling. Each value is then converted to
a binary number proportional to the amplitude of the sample. A central processor unit (CPU)
controls the input circuits that are fed by the ADCS. A large RAM (random access memory)
stores all the digital values in a buffer area. This digital information, representing the spoken
word, is now accessed by the CPU to process it further. The normal speech has a frequency
range of 200 Hz to 7 kHz. Recognizing a telephone call is more difficult as it has bandwidth
limitation of 300 Hz to3.3 kHz.
As explained earlier, the spoken words are processed by the filters and ADCs. The binary
representation of each of these words becomes a template or standard, against which the
future words are compared. These templates are stored in the memory. Once the storing process
is completed, the system can go into its active mode and is capable of identifying spoken words.
As each word is spoken, it is converted into binary equivalent and stored in RAM. The computer
then starts searching and compares the binary input pattern with the templates. t is to be noted
that even if the same speaker talks the same text, there are always slight variations in amplitude
or loudness of the signal, pitch, frequency difference, time gap, etc. Due to this reason, there is
never a perfect match between the template and binary input word.
The values of binary input words are subtracted from the corresponding values in the templates.
If both the values are same, the difference is zero and there is perfect match. If not, the
subtraction produces some difference or error. The smaller the error, the better the match. When
the best match occurs, the word is identified and displayed on the screen or used in some other
manner.
13
5. THE SIMPLE USER INTEREST TRACKER (SUITOR)
Computers would have been much more powerful, had they gained perceptual and sensory
abilities of the living beings on the earth. What needs to be developed is an intimate relationship
between the computer and the humans. And the Simple User Interest Tracker (SUITOR) is a
revolutionary approach in this direction. By observing the Webpage a netizen is browsing,
the SUITOR can help by fetching more information at his desktop. By simply noticing where
the user’s eyes focus on the computer screen, the SUITOR can be more precise in determining
his topic of interest. It can even deliver relevant information to a handheld device. The success
lies in how much the suitor can be intimate to the user. IBM's BlueEyes research project began
with a simple question, according to Myron Flickner, a manager in Almaden's USER group:
Can we exploit nonverbal cues to create more effective user interfaces? One such cue is gaze—
the direction in which a person is looking. Flickner and his colleagues have created some new
techniques for tracking a person's eyes and have incorporated this gaze-tracking technology into
two prototypes. One, called SUITOR (Simple User Interest Tracker), fills a scrolling ticker
on a computer screen with information related to the user's current task. SUITOR knows
where you are looking, what applications you are running, and what Web pages you may be
browsing. "If I'm reading a Web page about IBM, for instance," says Paul Maglio, the Almaden
cognitive scientist who invented SUITOR, "the system presents the latest stock price or business
news stories that could affect IBM. If I read the headline off the ticker, it pops up the story in a
browser window. If I start to read the story, it adds related stories to the ticker. That's the whole
idea of an attentive system—one that attends to what you are doing, typing, reading, so that it
can attend to your information needs."
14
6. THE BASICS OF FACE RECOGNITION
Face Recognition is definitely one of the most popular computer vision problems. Thanks to its
popularity it has been well studied over the last 50 years. The first intents to explore face
recognition were made in the 60’s however it was until the 90’s when Turk and Pentland
implemented the “Eigenfaces” algorithm, that this field showed some really exciting and useful
results.
Face recognition is recently getting more and more attention and we can anticipate bright future
of this field.
Security was historically and will stay in the future the main application of face recognition in
practice. Here face recognition can help with both: identification and authentication. Good
example is the Frankfurt airport security system which uses face recognition to automatize
passenger control. Another application can be the security analysis of videos purchased by
external city cameras systems. Potential suspects can get identified before committing crime.
Take a look at the integration of face recognition to London Borough of Newham, already in
1998.
Face recognition can be also used to speedup the identification of persons. We can image a
systems which would recognize the client as soon as he walks into branch store (bank,
assurance), and the front-office worker can than welcome the client by has name and prepare his
folder before he actually gets to the counter.
Advertising companies are working on ad-boards which would adapt their content to the persons
passing by. After analyzing the persons face, commercials would adapt to the gender, age, or
even personal style. This usage however, might not conform to privacy laws. Private companies
do not have rights to film persons in public places (of course, depending on the country.
Not to forgot, that Google and Facebook had both implemented algorithms to identify users in
the huge database of photos which they maintain as part of their social network services. Third
15
party services, such as Face.com offer Image base searching, which allow you search for
example for picture which contain together your best friends.
One of the latest usages is coming also from Google and it is the Face Unlock feature, which will
as the name says, enable you to unlock your phone after your face has been successfully
recognized.
Latest news come to face recognition thanks to new hardware equipment, specially 3D cameras.
3D cameras obtain much better results, thanks to their ability to obtain three dimensional image
of your face and solve problems which are the main issues in 2D face recognition (illumination,
background detection). See the example of Microsoft Kinect , which can recognize you as soon
as you walk in front of the camera.
We should keep in mind, that face recognition will be used more and more in the future. This
applies not only to face recognition, but to the whole field of machine learning. The amount of
data generated every second forces us to find ways to analyze the data. Machine learning will
help us to find a way to get meaningful information from the data. Face recognition is just one
concrete method from this big area.
Several methods and algorithms were developed since than, which makes the orientation in the
field quite difficult for the developers or computer scientists coming to face recognition for the
first time.
I would like this article to be a nice starter to the subject which will give you three pieces of
information:
What are the algorithms and methods used to perform face recognition.
Fully describe the “Eigenfaces” algorithm.
Show a fully functional example of face recognition using EmguCV library and
Silverlight Web Camera features. Go directly to the second part of this article, describing
the implementation.
16
6.3 Face Recognition Process
Face detection – detecting the pixels in the image which represent the face. There are
several algorithms for performing this task, one of these “Haar Cascade face detection” will
be used later in the example, however not explained in the article.
Face recognition – the actual task of recognizing the face by analyzing the part of the
imaged identified during the face detection phase.
Face recognition brings in several problems which are completely unique to this domain and
which make it one of the most challenging in the group of machine learning problems.
Illumination problem – due to the reflexivity of human skin, even a slight change in the
illumination of the image can widely affect the results.
Pose changes – any rotation of the had of a person will affect the performance.
Time delay – of course that due to the aging of the human individuals, the database have
to be regulary updated.
Apearance based statistical methods are methods which use statistics do define different
ways how to measure the distance between two images. In other words they try to find a
way to say how similar two faces are to each other. There are several methods which fall
into this group. The most significant are:
Principal Component Analysis (PCA) – described in this article.
Linear Discriminant Analysis (more information)
Independent Component Analysis (more information)
PCA is described in this article, others are not. For comparison of these methods refer to this
paper
Gabor Filters – filters commonly used in image processing, that have a capability to
capture important visual features. These filters are able to locate the important features in
17
the image such eyes, nose or mouth. This method can be combined with the previously
mentioned analytical methods to obtain better results.
Neural Networks are simulating the behavior of human brain to perform machine
learning tasks such as classification or prediction. In our case we need the classification of
an image. The explication of Neural Networks would take at least one entire article (if not
more). Basically Neural Network is a set of interconnected nodes. The edges which are
between the nodes are weighted so the information which travels between two nodes is
amplified. The information travels from set of input nodes, across a set of hidden nodes to a
set of output nodes. The developer has to invent a way to encode the input (in this case an
image) to a set of input nodes and decode the output (in this case a label identifying the
person) from the set of output points.
Commonly used method is to take one node for each pixel in the image on the input side of
the network and one node for each person in the database on the output side as ilustrated on
the following image:
18
6.5 Eigenfaces algorithm and PCA
The eigenfaces algorithm follows the pattern which is followed by other statistical methods as
well (LDA, ICA):
Compute the distance between the captured image and each of the images in the database.
Select the example from the database, which is closest to the processed image (the one
with the smallest distance to captured image).
If the distance is not too big – label the image as concrete person.
The crucial question is: How to express the distance between two images? One possibility
would be to compare the images pixel by pixel. But we can immediately feel, that this would not
work. Each picture would contribute the same to the comparison, but not each pixel holds
valuable information. For example background and hair pixels would arbitrary make the distance
larger or smaller. Also for direct comparison we would need the faces to be perfectly aligned in
all pictures and we would hope that the rotation of the head was always the same.
To overcome this issue the PCA algorithm creates a set of principal components, which are
called eigenfaces. Eigenfaces are images, that represent the main differences between all the
images in the database.
The recognizer first finds an average face by computing the average for each pixel in the image.
Each eigenface represents the differecens from the avarage face. First eigenface will represent
the most significant differences between all images and the average image and the last one the
least significant differences.
Here is the average image created by analyzing the faces of 10 consultants working at OCTO
Technology, having 5 image of each consultant.
19
And here are the first 5 Eigenfaces:
Now when we have the average image and the eigenfaces, each image in the database can be
represented as composition of these.
Let’s say:
Image1 = Average Image + 10% Eigenface 1 + 4% Eigenface 2 + … + 1% Eigenface 5
This basically means that we are able to express each image as a vector of percentages. The
previous image becomes to
our recognizer just a vector [0.10, 0.4, … , 0.1]. The previous equation is a slight simplification
of the subject. You might be asking yourself how are the coefficients of each eigenface
computed. If we would enter into the details,
Now when we have expressed the image as a simple vector, we are able to say what is the
distance between two images. Getting the distance of two vectors is not complicated and most of
us remember it from school. If in 2D space we have the vectors [x1, y1] and [x2, y2] we now
that the distance between these two can be visualized and computed as shown in the following
picture:
20
6.7 Behind The Scenes
This part will give you some more insights into how exactly are the eigenfaces computed and
what is going on behind the scenes. Feel free to skip it if you do not want to enter more into the
details.
The key point in the PCA is the reduction of dimensional space. The question we could pose us
is, how would we compare the images without eigenfaces. We would simply have to compare
each pixel. Having and image with resolution 50 x 50 pixels, that would give us 2500 pixels, in
other words we would have space of 2500 dimensions. When comparing using the eigenvalues
we arrived to reduce the dimensional space – the number of eigenfaces is the number of
dimensions in our new space.
Remember the equation of the distance of two points – now in this equation each pixel would
contribute to the distance between images. But not each pixel holds some meaningful
information. The background behind the face, the cheeks, the forehead, hairs – these are the
pixels which do not give meaningful information about the face. Instead the eyes, nose, ears are
important. In the terms of Image processing lot of pixels just bring noise to the computation of
the distance. One of the main ideas of PCA is the reduction of the noise by reducing the number
of the dimensions.
You may be asking yourself, how exactly can we reduce the number of dimensions? It is not
easy to imagine the transition from a space where each pixel is one dimension to a space where
each eigenface is one dimension. To understand this transition, take a look at this example:
21
In two-dimensional space, each point is defined by its two coordinates. However if we know,
that several points lay on the same line, we could identify the position of each point only by
knowing it’s position along the line. To obtain the x and y coordinate of the point we would need
to know the slope of the line and the position of the point on the line. We have reduced the
dimension by one. In this example the slope of the line becomes the principal component, the
same way our eigenfaces are principal components during the face recognition algorithm. Note
that we can even estimate the position for the points which are not on the line, by using the
projection of the point on the line (the case of the last point).
Well yes and no. Eigenfaces algorithm is the base of the research done in face recognition. Other
analytic methods such as Linear Component Analysis and Independent Component Analysis
build on the foundations defined by the eigenfaces algorithm. Gabor filters are used to identify
the important features in the face and later eigenfaces algorithm can be used to compare these
features.
Neural Networks are complex subject, but it has been shown that rarely they have better
performance then eigenfaces algorithm. Sometimes the image is first defined as linear
combination of eigenfaces and than it’s describing vector is passed to the Neural Network. In
other words eigenfaces algorithm builds really the base of face recognition.
There are several open-source libraries which have implemented one or more of these methods,
however as a developers we will not be able to use them without understanding how these
algorithms work.
22
7. FACE RECOGNITION: DIFFERENT APPROACHES
Face recognition is an evolving area, changing and improving constantly. Many research areas
affect face recognition - computer vision, optics, pattern recognition, neural networks, machine
learning, psychology, etcetera. Previous sections explain the different steps of a face recognition
process. However, these steps can overlap or change depending on the bibliography consulted.
There is not a consensus on that regard. All these factors hinder the development of a unified
face recognition algorithm classification scheme. This section explains the most cited criteria.
Face recognition algorithms can be classified as either geometry based or template based
algorithms. The template based methods compare the input image with a set of templates. The
set of templates can be constructed
using statistical tools like Support Vector Machines (SVM), Principal Component Analysis
(PCA) , Linear Discriminant Analysis (LDA), Independent Component Analysis (ICA), Kernel
Methods , or Trace Transforms. The geometry feature-based methods analyze local facial
features and their geometric relationships. This approach is sometimes collardfeature based
approach , Examples of this approach are some Elastic Bunch Graph Matching algorithms . This
approach is less used nowadays. There are algorithms developed using both approaches. For
instance, a 3D morphable model approach can use feature points or texture as well as PCA to
build a recognition system
7.2 Piecemeal/Holisticapproaches
Faces can often be identified from little information. Some algorithms follow this idea,
processing facial features independently. In other words, the relation between thefeatures or the
relation of a feature with thewhole face is not taken into account. Many early researchers
followed this approach, trying to deduce the most relevant features. Some approaches tried to use
the eyes, a combination of features, and so on. Some Hidden Markov Model (HMM) methods
also fall in thiscategory. Although feature processing is very important in face recognition,
23
relation between features (congeal processing) is also important. In fact, facial features are
processed holistically. That’s why nowadays most algorithms follow a holistic approach.
Facial recognition methods can be divided into appearance-based or model based algorithms.
The differential element of these methods is the representation of the face. Appearance-based
methods represent a face in terms of several raw intensity images. An image is considered as a
high-dimensional vector. Then statisticaltechniques are usually used to derive a feature space
from the image distribution. The sample image is compared to the training set. On the other
hand, the model-based approach tries to model a human face. The new sample is fitted to the
model, and the parameters of the ten model used to recognize the image. Appearance methods
can be classified as linear or non-linear, while model-based methods can be 2D or 3D. Linear
appearance-based methods perform a linear dimension reduction. The face vectors are projected
to the basis vectors, the projection coefficients are used as the feature representation of each face
image. Examples of this approach are PCA, LDA or ICA. Non-linear appearance methods are
more complicate. In fact, linear subspace analysis is an approximation of a nonlinear manifold.
KernelPCA (KPCA) is a method widely used. Model-based approaches can be 2-Dimensional or
3-Dimensional. These algorithms try to build a model of a human face. These models are often
morphable. A morphable model allows to classify faces even when pose changes are present. 3D
models are more complicate, as they try to capture the three-dimensional nature of human faces.
Examples of this approach are Elastic Bunch Graph Matching or 3D Morphable Models
A similar separation of pattern recognition algorithms into four groups is proposed by Jain and
colleges in [48]. We can grop face recognition methods into three main groups. The following
approaches are proposed:
Template Matching: Patterns are represented by samples, models, pixels, curves, textures,
curves, textures. The recognition function is usually a correlation or distance measure.
Statistical approach. Patterns are represented as features. The recognition function is a
discriminant function
24
Neural networks. The representation may vary. There is a network function in some point
Note that many algorithms, mostly current complex algorithms, may fall into more than one of
these categories. The most relevant face recognition algorithms will be discussed later under this
classification
Many face recognition algorithms include some template matching techniques. A template
matching process uses pixels, samples, models or textures as pattern. The recognition function
computes the differences between these features and the stored templates. It uses correlation or
25
distance measures. Although the matching of 2D images was the early trend, nowadays 3D
templates are more common. The 2D approaches are very sensitive to orientation or illumination
changes. One way of addressing this problem is using Elastic Bunch Graphs to represent images
[114]. Each subject has a bunch graph for each of it’s possible poses. Facial features are
extracted from the test image to form an image graph. This image graph can be compared to the
model graphs, matching the right class.
The introduction of 3D models is motivated by the potential ability of three dimensional patterns
to be unaffected by those two factors. The problem is that 3D data should be acquired doing 3D
scans, under controlledconditions. Moreover, in most cases requires the collaboration of the
subject to be recognized. Therefore, in applications such as surveillance systems, this kind of 3D
data may not be available during the recognition process. This is why there is tendency to build
training sets using 3D models, but gathering 2D images for recognition. Techniques that
construct 3D models from 2D data are being developed in this context.
Blanz and Vetter state in [13] that there are different ways of separating shape and orientation of
a face in 3D models: To match feature vertexes to image positions and then interpolate
deformations of the surface or to use restricted class-specific deformations, defined manually or
automatically, from non textured or textured head scans. Separation between texture and
illumination is achieved using models of illumination that consider illumination direction and
intensity from Lambertian or non-Lambertian reflectance. The initial constraint of systems like
the cited one is that the database of faces is obtained via 3D scans. So, there a need of building a
solid 3D model database. Other constraint is that it requires to manually define some feature
points. The recognition process is done by building a 3D model of the subject. Then, this 3D
model is compared with the stored patterns using two parameters -shape and texture. Their
algorithm achieves a performance of around 95.5%. Therefore, 3D models are a powerful
representation of human faces for recognition purposes. They have huge potential towards pose
and illumination invariant face recognition.
This solid representation of faces has been used in other algorithms for recognition purposes.
However, most current algorithms take advantage of statistical tools like PCA, computational
26
models and classifiers. Although pure sample-model matching systems are not viable, face
templates are a tool widely used in face recognition.
One interesting tool is the Adaptatively Appearance Model. Its goal is to minimize the difference
between the model and the input image, by varying the model parameters, c. There is a
parameter controlling shape,
x = ¯ x+Qsc, where ¯ x is the mean shape and Qs is the matrix defining the variation
possibilities. The transformation function St is typically described by a scaling, (scosθ−1,ssinθ),
an in-plane rotation θ and a translation (tx,ty). The pose parameter vector t =
(scosθ−1,ssinθ,tx,ty)T is then zero for an identity transformation and St+δt(x) ≈ St(Sδt(x)). There
is also a texture parameter g = ¯ g+Qgc, where ¯ g is the mean texture in a mean shaped path and
Qg is the matrix describing the modes of variation.
The texture in the image is defined as gim = Tu(g) = (u1 + 1)gim + u2, where u is the
transformation parameter vector. It is zero for an identity transformation and Tu+δu(x) ≈
Tu(Tδu(x)).
The parameters c and t define the position of the model points. During matching we sample the
pixels and project into the texture model frame. The current difference between the model and
the image (measured in the normalized texture frame) is
27
needed to specify a data point- of this data is too high. Therefore, the goal is to choose and apply
the right statistical tool for extraction and analysis of the underlaying manifold. These tools must
define the embedded face space in the image space and extract the basis functions from the face
space. This would permit patterns belonging to different classes to occupy disjoint and
compacted regions in the feature space. Consequently, we would be able to define a line, curve,
plane or hyperplane that separates faces belonging to different classes.
Many of these statistical tools are not used alone. They are modified or extended by researchers
in order to get better results. Some of them are embedded into bigger systems, or they are just a
part of a recognition algorithm. Many of them can be found along classification methods like a
DCT embedded in a Bayesian Network [83] or a Gabor Wavelet used with a Fuzzy Multilayer
Perceptron.
Figure: PCA. x and y are the original basis. φ is the first principal component
One of the most used and cited statistical method is the Principal Component Analysis (PCA). It
is a mathematical procedure that performs a dimensionality reduction by extracting the principal
components of the multi-dimensional data. The first principal component is the linear
combination of the original dimensions that has the highest variability. The n-th principal
28
component is the linear combination with the maximum variability, being orthogonal to the n-1
first principal components. The idea of PCA is illustrated in figure down.
The greatest variance of any projection of the data lies in the first coordinate. The n-st coordinate
will be the direction of the n-th maximum variance - the n-th principal component.
Usually the mean x is extracted from the data, so that PCA is equivalent to Karhunen-Loeve
Transform (KLT). So, let Xnxm be the the data matrix where x1,...,xm are the image vectors
(vector columns) and n is the number of pixels per image. The KLT basis is obtained by solving
the eigenvalue problem
Cx = ΦΛΦT
It is known that U = Φ . This method allows efficient implementation of PCA without having to
compute the data covariance matrix Cx -knowing that Cx = UTX. The embedding is done by yi
= UTxi, thus obtaining the mapped points y1,...,ym.
29
9.2 Discrete Cosine Transform
The Discrete Cosine Transform [2] DCT-II standard (often called simply DCT) expresses a
sequence of data points in terms of a sum of cosine functions oscillating at different frequencies.
It has strong energy compaction properties. Therefore, it can be used to transform images,
compacting the variations, allowing an effective dimensionality reduction. They have been
widely used for data compression. The DCT is based on the Fourier discrete transform, but using
only real numbers. When a DCT is performed over an image, the energy is compacted in the
upper-left corner. An example can be found in image 1.8. The face has been taken from the ORL
database, and a DCT performed over it. Let B be the DCT of an input image ANxM:
where M is the row size and N is the column size of A. We can truncate the matrix B, retaining
the upper-left area, which has the most information, reducing the dimensionality of the problem.
LDA is widely used to find linear combinations of features while preserving class separability.
Unlike PCA, LDA tries to model the differences between classes. Classic LDA is designed to
take into account only two classes. Specifically, it requires data points for different classes to be
far from each other, while point from the same class are close. Consequently, LDA obtains
differenced projection vectors for each class. Multi-class LDA algorithms which can manage
more than two classes are more used. Suppose we have m samples x1,...,xm belonging to c
30
classes; each class has mk elements. We assume that the mean has been extracted from the
samples, as in PCA. The objective function of the LDA can be defined as
The solution of this eigenproblem provides the eigenvectors; the embedding is done like the PCA
algorithms does.
31
9.4 Locality Preserving Projections
The Locality Preserving Projections (LPP) was introduced by He and Niyogi. It’s an alternative
to PCA, designed to preserve locality structure. Pattern recognition algorithms usually make a
search for the nearest pattern or neighbors. Therefore, the locality preserving quality of LPP can
quicken the recognition.
Let m be the number of points that we want to map. In our case, those points correspond to
images. The LPP algorithm has four steps
Constructing the adjacency map: A graph G with m nodes is built using, for example, k-NN
algorithm
Choosing the weights: Being Wija weight matrix, we can build it using a Heat kernel of
parameter t-if nodes i and j are connected, put
Solving the eigenproblem. D is a diagonal matrix where it’s elements are defined as dii
=Pjwij, and L=D-W is the Laplacian matrix. The following eigenproblem must be solved:
The embedding process and the PCA’s embedding process are analogous.
32
9.5 Gabor Wavele
Neurophysiological evidence from the visual cortex of mammalian brains suggests that simple
cells in the visual cortex can be viewed as a family of self-similar 2D Gabor wavelets. The Gabor
functions proposed by Daugmanare local spatial bandpass filters that achieve the theoretical limit
for conjoint resolution of information in the 2D spatial and 2D Fourier domains. Daugman
generalized the Gabor function to the following 2D form
Each Ψi is a plane wave characterized by the vector ki enveloped by a Gaussian function, where
σ is the standard deviation of this Gaussian. The center frequency of i-th filter is given by the
characteristic wave vector.
33
where the scale and orientation is given by (kv,θα), being v the spatial frequency number and α
the orientation.
An image is represented by the Gabor wavelet transform in four dimensions, two are the spatial
dimensions, and the other two represent spatial frequency structure and spatial relations or
orientation. So, processing the face image with Gabor filters with 5 spatial frequency (v = 0,..,4)
and 8 orientation (α = 0,..,7) captures the whole frequency spectrum - see image
So, we have 40 wavelets. The amplitude of Gabor filters are used for recognition.
Once the transformation has been performed, different techniques can be applied to extract the
relevant features, like high-energized points comparisons.
Then, there are rotation operations which derive independent components minimizing mutual
information. Finally, a normalization is carried out.
34
9.7 Kernel PCA
The use of Kernel functions for performing nonlinear PCA was introduced by Scholkopf et al
[97]. Its basic methodology is to apply a non-linear mapping to the input (Ψ(x) : RN → RL) and
then solve a linear PCA in the resulting feature subspace. The mapping of Ψ(x) is made
implicitly using kernel functions
where n the input space correspond to dot- products in the higher dimensional feature space.
Assuming that the projection of the data has been centered, the covariance is given by Cx =<
Ψ(xi),Ψ(xi)T >, with the resulting eigenproblem:
The kernel matrix K is then diagonalized by PCA. This leads to the conclusion that the n-th
principal component yn of x is given by
where Vn is the n-th eigenvector of the feature space defined by Ψ. The selection of an optimal
kernel is another engineering problem. Typical kernels include gossans
35
polynomial kernels
Other algorithms are worth mentioning. Fore example, genetic algorithms have been used, and
proved more accurate (but more resource-consuming) than PCA or LDA. Other successful
statistic tools include Bayesian networks bi-dimensional regression and ensemble-based and
other boosting methods.
36
10. Neural Network approach
Artificial neural networks are a popular tool in face recognition. They have been used in pattern
recognition and classification. Kohonen was the first to demonstrate that a neuron network could
be used to recognize aligned and normalized faces. Many different methods based on neural
network have been proposed since then. Some of these methods use neural networks just for
classification. One approach is to use decision-based neural networks, which classifies pre-
processed and sub sampled face images.
There are methods which perform feature extraction using neural networks. For example, Intrator
et. al proposed a hybrid or semi-supervised method. They combined unsupervised methods for
extracting features and supervised methods for finding features able to reduce classification error.
They used feed-forward neural networks (FFNN) for classification. They also tested their
algorithm using additional bias constraints, obtaining better results. They also demonstrated that
they could decrease the error rate training several neural networks and averaging over their
outputs, although it is more time-consuming that the simple method.
Lawrence et. al used self-organizing map neural network and convolutional networks. Self-
organizing maps (SOM) are used to project the data in a lower dimensional space and a
convolutional neural network (CNN) for partial translation and deformation invariance. Their
method is evaluated, by substituting the SOM with PCA and the CNN with a multi-layer
perceptron (MLP) and comparing the results. They conclude that a convolutional network is
preferable over a MPL without previous knowledge incorporation. The SOM seems to be
computationally costly and can be substituted by a PCA without loss of accuracy.
Overall, FFNN and CNN classification methods are not optimal in terms of computational time
and complexity. Their classification performance is bounded above by that of the eigenface but is
more costly to implement in practice.
Zhang and Fulcher presented and artificial neural network Group-based Adaptive Tolerance
(GAT) Tree model for translation-invariant face recognition in 1996. Their algorithm was
developed with the idea of implementing it in an airport surveillance system. The algorithm’s
input were passport photographs. This method builds a binary tree whose nodes are neural
37
network group-based nodes. So, each node is a complex classifier, being a MLP the basic neural
network for each group-based node.
Other authors used probabilistic decision based neural networks (PDBNN). Lin et al. developed
a face detection and recognition algorithm using this kind of network [64]. They applied it to
face detection, feature extraction
and classification. This network deployed one sub-net for each class, approximating the decision
region of each class locally. The inclusion of probability constraints lowered false acceptance
and false rejection rates.
Bhuiyan et al. proposed in 2007 a neural network method combined with Gabor filter [10]. Their
algorithm achieves face recognition by implementing a multilayer perceptron with back-
propagation algorithm. Firstly, there is a pre-processing step. Every image is normalized in terms
of contrast and illumination. Noise is reduce by a “fuzzily skewed” filter. It works by applying
fuzzy membership to the neighbor pixels of the target pixel. It uses the median value as the 1
value membership, and reduces the extreme values, taking advantage from median filter and
average filter.
Then, each image is processed through a Gabor filter. The filter is represented as a complex
sinusoidal signal modulated by a Gaussian kernel function. The Gabor filter has five orientation
parameters and three spatial frequencies, so there are 15 Gabor wavelets. The architecture of the
neural network is illustrated in figure.
38
To each face image, the outputs are 15 Gabor-images which record the variations measured by
the Gabor filters. The first layer receives the Gabor features. The number of nodes is equal to the
dimension of the feature vector containing the Gabor features. The output of the network is the
number of images the system must recognize. The training of the network, the backpropagation
algorithm, follows this procedure:
Initialization of the weights and threshold values.
Iterative process until termination condition is fulfilled:
o Activate, applying input and desired outputs. Calculate actual outputs of neurons
in hidden and output layers, using sigmoid activation function.
Hidden Markov Models (HMM) are a statistical tool used in face recognition. They has been
used in conjunction with neural networks. Bevilacqua et. al have developed a neural network that
trains pseudo two-dimensional HMM
The composition of a 1-dimensional HMM is illustrated in figure 1.11. A HMM λ can be defined
as λ = (A,B,Π)
39
A = [aij] is a state transition probability matrix, where aij is the probability that the state i
becomes the state j.
B = [bj(k)] is a state transition probability matrix, where bj(k) is the probability to have
the observation k when the state is j
Π = {π1,...,πn} is the initial state distribution, where πi is the probability associated to
state i.
They propose a pseudo 2D-HMM, defining superstrates formed by states, being the 3-6-6-6-3
the most successful configuration, as shown in figure. The input of this 2D-HMM process is
the output of the artificial neural network (ANN).
The ANN provides the algorithm with the proper dimensionality reduction. So, the input to
the 2D-HMM are images compressed into observation vectors of binary elements. The ANN,
using error backpropagation algorithm, extracts main features and stores them in a 50 bit.
sequence. The input face image is divided into 103 segments of 920 pixels, and each segment is
divided into four 230 pixel features. So, the first and last layers are formed by 230 neurons each.
The hidden layer is formed by 50 nodes. So a section of 920 pixels is compressed in four sub
windows of 50 binary values each. The training function is iterated 200 times for each photo.
40
Finally, the ANN is tested with images similar to the input image, doing this process for each
image. This method showed promising results, achieving a 100% accuracy with ORL database.
The introduction of fuzzy mathematics in neural networks for face recognition is another
approach. Bhattacharjee et al. developed in 2009 a face recognition system using a fuzzy
multilayer perceptron (MLP) [9]. The idea behind this approach is to capture decision surfaces in
non-linear manifolds, a task that a simple MLP can hardly complete.
The feature vectors are obtained using Gabor wavelet transforms. The method used is similar to
the one presented in point 10.1. Then, the output vectors obtained from that step must be
fuzzified. This process is simple: The more a feature vector approaches towards the class mean
vector, the higher is the fuzzy value. When the difference between both vectors increases, the
fuzzy value approaches towards 0.
The selected neural network is a MLP using back-propagation. There is a network for each class.
The examples of this class are class-one, and
the examples of the other classes form the class-two. Thus, is a two-class classification problem.
The fuzzification of the neural network is based on the following idea: The patterns whose class
is less certain should have lesser role in wight adjustment. So, for a two-class (i and j) fuzzy
partition φi(xk) k = 1,...,p of a set of p input vectors,
where di is the distance of the vector from the mean of class i. The constant c controls the rate at
which fuzzy membership decreases towards The contribution of xk in weight update is given by |
φ1(xk) − φ2(xk)|m, where m is a constant, and the rest of the process follows a usual MLP
procedure. The results of the algorithm show a 2.125 error rate using ORL database.
41
11. CONCLUSION
The nineties witnessed quantum leaps interface designing for improved manmachine
interactions. The BLUE EYES technology ensures a convenient way of simplifying the life by
providing more delicate and user friendly facilities in computing devices. Now that we have
proven the method, the next step is to improve the hardware. Instead of using cumbersome
modules to gather information about the user, it will be better to use smaller and less intrusive
units. The day is not far when this technology will push its way into your house hold, making
you more lazy. It may even reach your hand held mobile device. Any way this is only a
technological forecast.
42