International Journal of Signal Processing Volume 2 Number 2
Face Recognition: A Literature Review
A. S. Tolba, A.H. El-Baz, and A.A. El-Harby
assumed to be co-operative and makes an identity claim.
Computationally this means that it is not necessary to
consult the complete set of database images (denoted model
images below) in order to verify a claim. An incoming image
(referred to as a probe image) is thus compared to a small
number of model images of the person whose identity is
claimed and not, as in the recognition scenario, with every
image (or some descriptor of an image) in a potentially large
database. Second, an automatic authentication system must
operate in near-real time to be acceptable to users. Finally, in
recognition experiments, only images of people from the
training database are presented to the system, whereas the case
of an imposter (most likely a previously unseen person) is of
utmost importance for authentication.
Face recognition is a biometric approach that employs
automated methods to verify or recognize the identity of a
living person based on his/her physiological characteristics. In
general, a biometric identification system makes use of either
physiological characteristics (such as a fingerprint, iris pattern,
or face) or behaviour patterns (such as hand-writing, voice, or
key-stroke pattern) to identify a person. Because of human
inherent protectiveness of his/her eyes, some people are
reluctant to use eye identification systems. Face recognition
has the benefit of being a passive, non intrusive system to
verify personal identity in a “natural” and friendly way.
In general, biometric devices can be explained with a threestep procedure (1) a sensor takes an observation. The type of
sensor and its observation depend on the type of biometric
devices used. This observation gives us a “Biometric
Signature” of the individual. (2) a computer algorithm
“normalizes” the biometric signature so that it is in the same
format (size, resolution, view, etc.) as the signatures on the
system’s database. The normalization of the biometric
signature gives us a “Normalized Signature” of the individual.
(3) a matcher compares the normalized signature with the set
(or sub-set) of normalized signatures on the system's database
and provides a “similarity score” that compares the
individual's normalized signature with each signature in the
database set (or sub-set). What is then done with the similarity
scores depends on the biometric system’s application?
Face recognition starts with the detection of face patterns in
sometimes cluttered scenes, proceeds by normalizing the face
images to account for geometrical and illumination changes,
possibly using information about the location and appearance
of facial landmarks, identifies the faces using appropriate
classification algorithms, and post processes the results using
model-based schemes and logistic feedback [3].
The application of face recognition technique can be
categorized into two main parts: law enforcement application
and commercial application. Face recognition technology is
Abstract—The task of face recognition has been actively
researched in recent years. This paper provides an up-to-date review
of major human face recognition research. We first present an
overview of face recognition and its applications. Then, a literature
review of the most recent face recognition techniques is presented.
Description and limitations of face databases which are used to test
the performance of these face recognition algorithms are given. A
brief summary of the face recognition vendor test (FRVT) 2002, a
large scale evaluation of automatic face recognition technology, and
its conclusions are also given. Finally, we give a summary of the
research results.
Keywords—Combined classifiers,
matching, neural networks.
F
face
recognition,
graph
I. INTRODUCTION
ACE recognition is an important research problem
spanning numerous fields and disciplines. This because
face recognition, in additional to having numerous practical
applications such as bankcard identification, access control,
Mug shots searching, security monitoring, and surveillance
system, is a fundamental human behaviour that is essential for
effective communications and interactions among people.
A formal method of classifying faces was first proposed in
[1]. The author proposed collecting facial profiles as curves,
finding their norm, and then classifying other profiles by their
deviations from the norm. This classification is multi-modal,
i.e. resulting in a vector of independent measures that could be
compared with other vectors in a database.
Progress has advanced to the point that face recognition
systems are being demonstrated in real-world settings [2]. The
rapid development of face recognition is due to a combination
of factors: active development of algorithms, the availability
of a large databases of facial images, and a method for
evaluating the performance of face recognition algorithms.
In the literatures, face recognition problem can be
formulated as: given static (still) or video images of a scene,
identify or verify one or more persons in the scene by
comparing with faces stored in a database.
When comparing person verification to face recognition,
there are several aspects which differ. First, a client – an
authorized user of a personal identification system – is
Manuscript received February 22, 2005.
A. S. Tolba is with the Information Systems Department, Mansoura
University, Egypt, (e-mail: tolba1954@)yahoo.com).
A. H. EL-Baz is with the Mathematics Department, Damietta Faculty of
Science, New Damietta, Egypt, and doing PhD research on pattern
recognition
(phone: 0020-57-403980; Fax: 0020-57–403868; e-mail:
ali_elbaz@yahoo.com).
A. H. EL-Harby is with the Mathematics Department, Damietta Faculty of
Science, New Damietta, Egypt, (e-mail: elharby@yahoo.co.uk).
88
International Journal of Signal Processing Volume 2 Number 2
primarily used in law enforcement applications, especially
Mug shot albums (static matching) and video surveillance
(real-time matching by video image sequences). The
commercial applications range from static matching of
photographs on credit cards, ATM cards, passports, driver’s
licenses, and photo ID to real-time matching with still images
or video image sequences for access control. Each application
presents different constraints in terms of processing.
All face recognition algorithms consistent of two major
parts: (1) face detection and normalization and (2) face
identification. Algorithms that consist of both parts are
referred to as fully automatic algorithms and those that consist
of only the second part are called partially automatic
algorithms. Partially automatic algorithms are given a facial
image and the coordinates of the center of the eyes. Fully
automatic algorithms are only given facial images.
On the other hand, the development of face recognition
over the past years allows an organization into three types of
recognition algorithms, namely frontal, profile, and viewtolerant recognition, depending on the kind of images and the
recognition algorithms. While frontal recognition certainly is
the classical approach, view-tolerant algorithms usually
perform recognition in a more sophisticated fashion by taking
into consideration some of the underlying physics, geometry,
and statistics. Profile schemes as stand-alone systems have a
rather marginal significance for identification, (for more detail
see [4]). However, they are very practical either for fast coarse
pre-searches of large face database to reduce the
computational load for a subsequent sophisticated algorithm,
or as part of a hybrid recognition scheme. Such hybrid
approaches have a special status among face recognition
systems as they combine different recognition approaches in
an either serial or parallel order to overcome the shortcoming
of the individual components.
Another way to categorize face recognition techniques is to
consider whether they are based on models or exemplars.
Models are used in [5] to compute the Quotient Image, and in
[6] to derive their Active Appearance Model. These models
capture class information (the class face), and provide strong
constraints when dealing with appearance variation. At the
other extreme, exemplars may also be used for recognition.
The ARENA method in [7] simply stores all training and
matches each one against the task image. As far we can tell,
current methods that employ models do not use exemplars,
and vice versa. This is because these two approaches are by no
means mutually exclusive. Recently, [8] proposed a way of
combining models and exemplars for face recognition. In
which, models are used to synthesize additional training
images, which can then be used as exemplars in the learning
stage of a face recognition system.
Focusing on the aspect of pose invariance, face recognition
approaches may be divided into two categories: (i) global
approach and (ii) component-based approach. In global
approach, a single feature vector that represents the whole
face image is used as input to a classifier. Several classifiers
have been proposed in the literature e.g. minimum distance
classification in the eigenspace [9,10], Fisher’s discriminant
analysis [11], and neural networks [12]. Global techniques
work well for classifying frontal views of faces. However,
they are not robust against pose changes since global features
are highly sensitive to translation and rotation of the face. To
avoid this problem an alignment stage can be added before
classifying the face. Aligning an input face image with a
reference face image requires computing correspondence
between the two face images. The correspondence is usually
determined for a small number of prominent points in the face
like the center of the eye, the nostrils, or the corners of the
mouth. Based on these correspondences, the input face image
can be warped to a reference face image.
In [13], an affine transformation is computed to perform the
warping. Active shape models are used in [14] to align input
faces with model faces. A semi-automatic alignment step in
combination with support vector machines classification was
proposed in [15]. An alternative to the global approach is to
classify local facial components. The main idea of component
based recognition is to compensate for pose changes by
allowing a flexible geometrical relation between the
components in the classification stage.
In [16], face recognition was performed by independently
matching templates of three facial regions (eyes, nose and
mouth). The configuration of the components during
classification was unconstrained since the system did not
include a geometrical model of the face. A similar approach
with an additional alignment stage was proposed in [17]. In
[18], a geometrical model of a face was implemented by a
2D elastic graph. The recognition was based on wavelet
coefficients that were computed on the nodes of the elastic
graph. In [19], a window was shifted over the face image and
the DCT coefficients computed within the window were fed
into a 2D Hidden Markov Model.
Face recognition research still face challenge in some
specific domains such as pose and illumination changes.
Although numerous methods have been proposed to solve
such problems and have demonstrated significant promise, the
difficulties still remain. For these reasons, the matching
performance in current automatic face recognition is relatively
poor compared to that achieved in fingerprint and iris
matching, yet it may be the only available measuring tool for
an application. Error rates of 2-25% are typical. It is effective
if combined with other biometric measurements.
Current systems work very well whenever the test image to
be recognized is captured under conditions similar to those of
the training images. However, they are not robust enough if
there is variation between test and training images [20].
Changes in incident illumination, head pose, facial expression,
hairstyle (include facial hair), cosmetics (including eyewear)
and age, all confound the best systems today.
As a general rule, we may categorize approaches used to
cope with variation in appearance into three kinds: invariant
features, canonical forms, and variation- modeling. The first
approach seeks to utilize features that are invariant to the
changes being studied. For instance, the Quotient Image [5] is
(by construction) invariant to illumination and may be used to
recognize faces (assumed to be Lambertian) when lighting
conditions change.
The second approach attempts to “normalize” away the
variation, either by clever image transformations or by
synthesizing a new image (from the given test image) in some
89
International Journal of Signal Processing Volume 2 Number 2
“canonical” or “prototypical” form. Recognition is then
performed using this canonical form. Examples of this
approach include [21,22]. In [21], for instance, the test image
under arbitrary illumination is re-rendered under frontal
illumination, and then compared against other frontallyilluminated prototypes.
The third approach of variation-modeling is self
explanatory: the idea is to learn, in some suitable subspace,
the extent of the variation in that space. This usually leads to
some parameterization of the subspace(s). Recognition is then
performed by choosing the subspace closest to the test image,
after the latter has been appropriately mapped. In effect, the
recognition step recovers the variation (e.g. pose estimation)
as well as the identity of the person. For examples of this
technique, see [18, 23, 24 and 25].
Despite the plethora of techniques, and the valiant effort of
many researchers, face recognition remains a difficult,
unsolved problem in general. While each of the above
approaches works well for the specific variation being studied,
performance degrades rapidly when other variations are
present. For instance, a feature invariant to illumination works
well as long as pose or facial expression remains constant, but
fails to be invariant when pose or expression is changed. This
is not a problem for some applications, such as controlling
access to a secured room, since both the training and test
images may be captured under similar conditions. However,
for general, unconstrained recognition, none of these
techniques are robust enough.
Moreover, it is not clear that different techniques can be
combined to overcome each other’s limitations. Some
techniques, by their very nature, exclude others. For example,
the Symmetric Shape-from-Shading method of [22] relies on
the approximate symmetry of a frontal face. It is unclear how
this may be combined with a technique that depends on side
profiles, where the symmetry is absent.
We can make two important observations after surveying
the research literature: (1) there does not appear to be any
feature, set of features, or subspace that is simultaneously
invariant to all the variations that a face image may exhibit,
(2) given more training images, almost any technique will
perform better. These two factors are the major reasons why
face recognition is not widely used in real-world applications.
The fact is that for many applications, it is usual to require the
ability to recognize faces under different variations, even
when training images are severely limited.
A. Eigenfaces
Eigenface is one of the most thoroughly investigated
approaches to face recognition. It is also known as KarhunenLoève expansion, eigenpicture, eigenvector, and principal
component. References [26, 27] used principal component
analysis to efficiently represent pictures of faces. They argued
that any face images could be approximately reconstructed by
a small collection of weights for each face and a standard face
picture (eigenpicture). The weights describing each face are
obtained by projecting the face image onto the eigenpicture.
Reference [28] used eigenfaces, which was motivated by the
technique of Kirby and Sirovich, for face detection and
identification.
In mathematical terms, eigenfaces are the principal
components of the distribution of faces, or the eigenvectors of
the covariance matrix of the set of face images. The
eigenvectors are ordered to represent different amounts of the
variation, respectively, among the faces. Each face can be
represented exactly by a linear combination of the eigenfaces.
It can also be approximated using only the “best” eigenvectors
with the largest eigenvalues. The best M eigenfaces construct
an M dimensional space, i.e., the “face space”. The authors
reported 96 percent, 85 percent, and 64 percent correct
classifications averaged over lighting, orientation, and size
variations, respectively. Their database contained 2,500
images of 16 individuals.
As the images include a large quantity of background area,
the above results are influenced by background. The authors
explained the robust performance of the system under
different lighting conditions by significant correlation between
images with changes in illumination. However, [29] showed
that the correlation between images of the whole faces is not
efficient
for
satisfactory
recognition
performance.
Illumination normalization [27] is usually necessary for the
eigenfaces approach.
Reference [30] proposed a new method to compute the
covariance matrix using three images each was taken in
different lighting conditions to account for arbitrary
illumination effects, if the object is Lambertian. Reference
[31] extended their early work on eigenface to eigenfeatures
corresponding to face components, such as eyes, nose, and
mouth. They used a modular eigenspace which was composed
of the above eigenfeatures (i.e., eigeneyes, eigennose, and
eigenmouth). This method would be less sensitive to
appearance changes than the standard eigenface method. The
system achieved a recognition rate of 95 percent on the
FERET database of 7,562 images of approximately 3,000
individuals. In summary, eigenface appears as a fast, simple,
and practical method. However, in general, it does not provide
invariance over changes in scale and lighting conditions.
Recently, in [32] experiments with ear and face recognition,
using the standard principal component analysis approach ,
showed that the recognition performance is essentially
identical using ear images or face images and combining the
two for multimodal recognition results in a statistically
significant performance improvement. For example, the
difference in the rank-one recognition rate for the day
variation experiment using the 197-image training sets is
II. LITERATURE REVIEW OF FACE RECOGNITION TECHNIQUES
This section gives an overview on the major human face
recognition techniques that apply mostly to frontal faces,
advantages and disadvantages of each method are also given.
The methods considered are eigenfaces (eigenfeatures), neural
networks, dynamic link architecture, hidden Markov model,
geometrical feature matching, and template matching. The
approaches are analyzed in terms of the facial representations
they used.
90
International Journal of Signal Processing Volume 2 Number 2
90.9% for the multimodal biometric versus 71.6% for the ear
and 70.5% for the face.
There is substantial related work in multimodal biometrics.
For example [33] used face and fingerprint in multimodal
biometric identification, and [34] used face and voice.
However, use of the face and ear in combination seems more
relevant to surveillance applications.
misclassified to the wrong subnet, the rightful subnet will tune
its parameters so that its decision-region can be moved closer
to the misclassified sample.
PDBNN-based biometric identification system has the
merits of both neural networks and statistical approaches, and
its distributed computing principle is relatively easy to
implement on parallel computer. In [39], it was reported that
PDBNN face recognizer had the capability of recognizing up
to 200 people and could achieve up to 96% correct
recognition rate in approximately 1 second. However, when
the number of persons increases, the computing expense will
become more demanding. In general, neural network
approaches encounter problems when the number of classes
(i.e., individuals) increases. Moreover, they are not suitable
for a single model image recognition test because multiple
model images per person are necessary in order for training
the systems to “optimal” parameter setting.
B. Neural Networks
The attractiveness of using neural networks could be due to
its non linearity in the network. Hence, the feature extraction
step may be more efficient than the linear Karhunen-Loève
methods. One of the first artificial neural networks (ANN)
techniques used for face recognition is a single layer adaptive
network called WISARD which contains a separate network
for each stored individual [35]. The way in constructing a
neural network structure is crucial for successful recognition.
It is very much dependent on the intended application. For
face detection, multilayer perceptron [36] and convolutional
neural network [37] have been applied. For face verification,
[38] is a multi-resolution pyramid structure. Reference [37]
proposed a hybrid neural network which combines local
image sampling, a self-organizing map (SOM) neural
network, and a convolutional neural network. The SOM
provides a quantization of the image samples into a
topological space where inputs that are nearby in the original
space are also nearby in the output space, thereby providing
dimension reduction and invariance to minor changes in the
image sample. The convolutional network extracts
successively larger features in a hierarchical set of layers and
provides partial invariance to translation, rotation, scale, and
deformation. The authors reported 96.2% correct recognition
on ORL database of 400 images of 40 individuals.
The classification time is less than 0.5 second, but the
training time is as long as 4 hours. Reference [39] used
probabilistic decision-based neural network (PDBNN) which
inherited the modular structure from its predecessor, a
decision based neural network (DBNN) [40]. The PDBNN
can be applied effectively to 1) face detector: which finds the
location of a human face in a cluttered image, 2) eye localizer:
which determines the positions of both eyes in order to
generate meaningful feature vectors, and 3) face recognizer.
PDNN does not have a fully connected network topology.
Instead, it divides the network into K subnets. Each subset is
dedicated to recognize one person in the database. PDNN uses
the Guassian activation function for its neurons, and the
output of each “face subnet” is the weighted summation of the
neuron outputs. In other words, the face subnet estimates the
likelihood density using the popular mixture-of-Guassian
model. Compared to the AWGN scheme, mixture of Guassian
provides a much more flexible and complex model for
approximating the time likelihood densities in the face space.
The learning scheme of the PDNN consists of two phases,
in the first phase; each subnet is trained by its own face
images. In the second phase, called the decision-based
learning, the subnet parameters may be trained by some
particular samples from other face classes. The decision-based
learning scheme does not use all the training samples for the
training. Only misclassified patterns are used. If the sample is
C. Graph Matching
Graph matching is another approach to face recognition.
Reference [41] presented a dynamic link structure for
distortion invariant object recognition which employed elastic
graph matching to find the closest stored graph. Dynamic link
architecture is an extension to classical artificial neural
networks. Memorized objects are represented by sparse
graphs, whose vertices are labeled with a multiresolution
description in terms of a local power spectrum and whose
edges are labeled with geometrical distance vectors. Object
recognition can be formulated as elastic graph matching which
is performed by stochastic optimization of a matching cost
function. They reported good results on a database of 87
people and a small set of office items comprising different
expressions with a rotation of 15 degrees.
The matching process is computationally expensive, taking
about 25 seconds to compare with 87 stored objects on a
parallel machine with 23 transputers. Reference [42] extended
the technique and matched human faces against a gallery of
112 neutral frontal view faces. Probe images were distorted
due to rotation in depth and changing facial expression.
Encouraging results on faces with large rotation angles were
obtained. They reported recognition rates of 86.5% and 66.4%
for the matching tests of 111 faces of 15 degree rotation and
110 faces of 30 degree rotation to a gallery of 112 neutral
frontal views. In general, dynamic link architecture is superior
to other face recognition techniques in terms of rotation
invariance; however, the matching process is computationally
expensive.
D. Hidden Markov Models (HMMs)
Stochastic modeling of nonstationary vector time series
based on (HMM) has been very successful for speech
applications. Reference [43] applied this method to human
face recognition. Faces were intuitively divided into regions
such as the eyes, nose, mouth, etc., which can be associated
with the states of a hidden Markov model. Since HMMs
require a one-dimensional observation sequence and images
are two-dimensional, the images should be converted into
either 1D temporal sequences or 1D spatial sequences.
91
International Journal of Signal Processing Volume 2 Number 2
In [44], a spatial observation sequence was extracted from a
face image by using a band sampling technique. Each face
image was represented by a 1D vector series of pixel
observation. Each observation vector is a block of L lines and
there is an M lines overlap between successive observations.
An unknown test image is first sampled to an observation
sequence. Then, it is matched against every HMMs in the
model face database (each HMM represents a different
subject). The match with the highest likelihood is considered
the best match and the relevant model reveals the identity of
the test face.
The recognition rate of HMM approach is 87% using ORL
database consisting of 400 images of 40 individuals. A pseudo
2D HMM [44] was reported to achieve a 95% recognition rate
in their preliminary experiments. Its classification time and
training time were not given (believed to be very expensive).
The choice of parameters had been based on subjective
intuition.
In summary, geometrical feature matching based on
precisely measured distances between features may be most
useful for finding possible matches in a large database such as
a Mug shot album. However, it will be dependent on the
accuracy of the feature location algorithms. Current automated
face feature location algorithms do not provide a high degree
of accuracy and require considerable computational time.
F. Template Matching
A simple version of template matching is that a test image
represented as a two-dimensional array of intensity values is
compared using a suitable metric, such as the Euclidean
distance, with a single template representing the whole face.
There are several other more sophisticated versions of
template matching on face recognition. One can use more than
one face template from different viewpoints to represent an
individual's face.
A face from a single viewpoint can also be represented by a
set of multiple distinctive smaller templates [49,52]. The face
image of gray levels may also be properly processed before
matching [53]. In [49], Bruneli and Poggio automatically
selected a set of four features templates, i.e., the eyes, nose,
mouth, and the whole face, for all of the available faces. They
compared the performance of their geometrical matching
algorithm and template matching algorithm on the same
database of faces which contains 188 images of 47
individuals. The template matching was superior in
recognition (100 percent recognition rate) to geometrical
matching (90 percent recognition rate) and was also simpler.
Since the principal components (also known as eigenfaces or
eigenfeatures) are linear combinations of the templates in the
data basis, the technique cannot achieve better results than
correlation [49], but it may be less computationally expensive.
One drawback of template matching is its computational
complexity. Another problem lies in the description of these
templates. Since the recognition system has to be tolerant to
certain discrepancies between the template and the test image,
this tolerance might average out the differences that make
individual faces unique.
In general, template-based approaches compared to feature
matching are a more logical approach. In summary, no
existing technique is free from limitations. Further efforts are
required to improve the performances of face recognition
techniques, especially in the wide range of environments
encountered in real world.
E. Geometrical Feature Matching
Geometrical feature matching techniques are based on the
computation of a set of geometrical features from the picture
of a face. The fact that face recognition is possible even at
coarse resolution as low as 8x6 pixels [45] when the single
facial features are hardly revealed in detail, implies that the
overall geometrical configuration of the face features is
sufficient for recognition. The overall configuration can be
described by a vector representing the position and size of the
main facial features, such as eyes and eyebrows, nose, mouth,
and the shape of face outline.
One of the pioneering works on automated face recognition
by using geometrical features was done by [46] in 1973. Their
system achieved a peak performance of 75% recognition rate
on a database of 20 people using two images per person, one
as the model and the other as the test image. References
[47,48] showed that a face recognition program provided with
features extracted manually could perform recognition
apparently with satisfactory results. Reference [49]
automatically extracted a set of geometrical features from the
picture of a face, such as nose width and length, mouth
position, and chin shape. There were 35 features extracted
form a 35 dimensional vector. The recognition was then
performed with a Bayes classifier. They reported a recognition
rate of 90% on a database of 47 people.
Reference [50] introduced a mixture-distance technique
which achieved 95% recognition rate on a query database of
685 individuals. Each face was represented by 30 manually
extracted distances. Reference [51] used Gabor wavelet
decomposition to detect feature points for each face image
which greatly reduced the storage requirement for the
database. Typically, 35-45 feature points per face were
generated. The matching process utilized the information
presented in a topological graphic representation of the feature
points. After compensating for different centroid location, two
cost values, the topological cost, and similarity cost, were
evaluated. The recognition accuracy in terms of the best match
to the right person was 86% and 94% of the correct person's
faces was in the top three candidate matches.
G. 3D Morphable Model
The morphable face model is based on a vector space
representation of faces [54] that is constructed such that any
convex combination of shape and texture vectors of a set of
examples describes a realistic human face.
Fitting the 3D morphable model to images can be used in
two ways for recognition across different viewing conditions:
Paradigm 1. After fitting the model, recognition can be based
on model coefficients, which represent intrinsic shape and
texture of faces, and are independent of the imaging
conditions: Paradigm 2. Three-dimension face reconstruction
can also be employed to generate synthetic views from gallery
probe images [55-58]. The synthetic views are then
92
International Journal of Signal Processing Volume 2 Number 2
transferred to a second, viewpoint-dependent recognition
system.
More recently, [59] combines deformable 3 D models with
a computer graphics simulation of projection and illumination.
Given a single image of a person, the algorithm automatically
estimates 3D shape, texture, and all relevant 3D scene
parameters. In this framework, rotations in depth or changes
of illumination are very simple operations, and all poses and
illuminations are covered by a single model. Illumination is
not restricted to Lambertian reflection, but takes into account
specular reflections and cast shadows, which have
considerable influence on the appearance of human skin.
This approach is based on a morphable model of 3D faces
that captures the class-specific properties of faces. These
properties are learned automatically from a data set of 3D
scans. The morphable model represents shapes and textures of
faces as vectors in a high-dimensional face space, and
involves a probability density function of natural faces within
face space. The algorithm presented in [59] estimates all 3D
scene parameters automatically, including head position and
orientation, focal length of the camera, and illumination
direction. This is achieved by a new initialization procedure
that also increases robustness and reliability of the system
considerably. The new initialization uses image coordinates of
between six and eight feature points.
The percentage of correct identification on CMU-PIE
database, based on side-view gallery, was 95% and the
corresponding percentage on the FERET set, based on frontal
view gallery images, along with the estimated head poses
obtained from fitting, was 95.9%.
edge map to line segments. After thinning the edge map, a
polygonal line fitting process [62] is applied to generate the
LEM of a face. An example of a human frontal face LEM is
illustrated in Fig. 1. The LEM representation reduces the
storage requirement since it records only the end points of line
segments on curves. Also, LEM is expected to be less
sensitive to illumination changes due to the fact that it is an
intermediate-level image representation derived from low
level edge map representation. The basic unit of LEM is the
line segment grouped from pixels of edge map.
A face prefilering algorithm is proposed that can be used as
a preprocess of LEM matching in face identification
application. The prefilering operation can speed up the search
by reducing the number of candidates and the actual face
(LEM) matching is only carried out on a subset of remaining
models.
Experiments on frontal faces under controlled /ideal
conditions indicate that the proposed LEM is consistently
superior to edge map. LEM correctly identify 100% and
96.43% of the input frontal faces on face databases [63,64],
respectively. Compared with the eigenface method, LEM
performed equally as the eigenface method for faces under
ideal conditions and significantly superior to the eigenface
method for faces with slight appearance variations (see Table
I). Moreover, the LEM approach is much more robust to size
variation than the eigenface method and edge map approach
(see Table II) .
In [61], the LEM approach is shown to be significantly
superior to the eigenface approach for identifying faces under
varying lighting condition. The LEM approach is also less
sensitive to pose variations than the eigenface method but
more sensitive to large facial expression changes.
III. RECENT TECHNIQUES
A. Line Edge Map (LEM)
Edge information is a useful object representation feature
that is insensitive to illumination changes to certain extent.
Though the edge map is widely used in various pattern
recognition fields, it has been neglected in face recognition
except in recent work reported in [60].
Edge images of objects could be used for object recognition
and to achieve similar accuracy as gray-level pictures.
Reference [60] made use of edge maps to measure the
similarity of face images. A 92% accuracy was achieved.
Takács argued that process of face recognition might start at a
much earlier stage and edge images can be used for the
recognition of faces without the involvement of high-level
cognitive functions.
A Line Edge Map approach, proposed by [61], extracts
lines from a face edge map as features. This approach can be
considered as a combination of template matching and
geometrical feature matching. The LEM approach not only
possesses the advantages of feature-based approaches, such as
invariance to illumination and low memory requirement, but
also has the advantage of high recognition performance of
template matching.
Line Edge Map integrate the structural information with
spatial information of a face image by grouping pixels of face
Fig. 1 An illustration of a face LEM
TABLE I
FACE RECOGNITION RESULTS OF EDGE MAP. EIGINFACE (20- EIGINVECTORS),
AND LEM [61]
Bern database
AR database
Method
EM
Recognition
97.7%
rate
93
Eigenface LEM
100%
100%
EM
Eigenface
LEM
88.4%
55.4%
96.4%
International Journal of Signal Processing Volume 2 Number 2
Classification (NCC) criterion. Both approaches start with the
eigenface feature, but different in the classification algorithm.
The error rates are calculated as the function of the number of
eigenface, i.e., the feature dimension. The minimum error of
SVM is 8.79%, which is much better than the 15.14% of
NCC.
In [68], the face recognition problem is formulated as a
problem in difference space, which models dissimilarities
between two facial images. In different space they formulate
face recognition as a two class problem. The cases are:
(i) Dissimilarities between faces of the same person, and (ii)
Dissimilarities between faces of different people. By
modifying the interpretation of the decision surface generated
a similarity metric between faces, that is learned from
examples of differences between faces. The SVM-based
algorithm is compared with a principal component analysis
(PCA) based algorithm on a difficult set of images from the
FERET database. Performance was measured for both
verification and identification scenarios. The identification
performance for SVM is 77-78% versus 54% for PCA. For
verification, the equal error rate is 7% for SVM and 13% for
PCA.
Reference [69] presented a component-based technique and
two global techniques for face recognition and evaluated their
performance with respect to robustness against pose changes.
The component-based system detected and extracted a set of
10 facial components and arranged them in a single feature
vector that was classified by linear SVMs. In both global
systems the whole face is detected, extracted from the image
and used as input to the classifiers. The first global system
consisted of a single SVM for each person in the database. In
the second system, the database of each person is clustered
and trained on a set of view-specific SVM classifiers. The
systems were tested on a database consisting of 8.593 gray
faces mages which included faces rotated in depth up to about
400. In all experiments the component-based system
outperformed the global systems even though a more powerful
classifier is used (i.e. non-linear instead of linear SVMs) for
the global system. This shows that using facial components
instead of the whole face pattern as input features significantly
simplifies the test of face recognition.
Reference [70] presented a new development in component
based face recognition by incorporation a 3D morphable
model into the training process. Based on two face images of a
person and a 3D morphable model into they computed the 3D
face model of each person in the database. By rendering the
3D models under varying poses and lighting conditions, a
large number of synthetic face images is used to train the
component based recognition system. A component based
recognition rates around 98% is achieved for faces rotated up
to ± 360 in depth. A major drawback of the system was the
need of a large number of training images taken from
viewpoints and under different lighting conditions.
In [71], a client-specific solution is adopted which requires
learning client-specific support vectors. This representation is
different from the one given in [68]. Where in [68], as
mentioned before, SVM was trained to distinguish between
the populations of within-client and between-client difference
images respectively. Moreover, they investigate the inherent
TABLE II
RECOGNITION RESULTS WITH SIZE VARIATIONS [61]
Edge map
Top 1
43.3%
Top 5
56.0%
Top 10
64.7%
Eigenface (112-eigenvectors)
44.9%
68.8%
75.9%
LEM (pLHD)
53.8%
67.6%
71.9%
LEM (LHD)
66.5%
75.9%
79.7%
B. Support Vector Machine (SVM)
SVM is a learning technique that is considered an effective
method for general purpose pattern recognition because of its
high generalization performance without the need to add other
knowledge [65]. Intuitively, given a set of points belonging to
two classes, a SVM finds the hyperplane that separates the
largest possible fraction of points of the same class on the
same side, while maximizing the distance from either class to
the hyperplane. According to [65], this hyperplane is called
Optimal Separating Hyperplane (OSH) which minimizes the
risk of misclassifying not only the examples in the training set
but also the unseen example of the test set.
SVM can also be viewed as a way to train polynomial
neural networks or Radial Basis function classifiers. The
training techniques used here are based on the principle of
Structure Risk Minimization (SRM), which states that better
generalization capabilities are achieved through a
minimization of the bound on the generalization error. Indeed,
this learning technique is just equivalent to solving a linearly
constrained Quadratic Programming (QP) problem. SVM is
suitable for average size face recognition systems because
normally those systems have only a small number of training
samples. But in a large number of QP problems, Reference
[66] presented a decomposition algorithm that guarantees
global optimality, and can be used to train SVMs over very
large data set.
In summary, the main characteristics of SVMs are: (1) that
they minimize a formally proven upper bound on the
generalization error; (2) that they work on high-dimensional
feature spaces by means of a dual formulation in terms of
kernels; (3) that the prediction is based on hyperplanes in
these feature spaces, which may correspond to quite involved
classification criteria on the input data; and (4) that outliers in
the training data set can be handled by means of soft margins.
The application of SVMs to computer vision problem have
been proposed recently. Reference [67] used the SVMs with a
binary tree recognition strategy to tackle the face recognition
problem. After the features are extracted, the discrimination
functions between each pair are learned by SVMs. Then, the
disjoint test set enters the system for recognition. They
propose to construct a binary tree structure to recognize the
testing samples. Two sets of experiments were presented. The
first experiment is on the Cambridge Olivetti Research Lab
(ORL) face database of 400 images of 40 individuals. The
second is on a larger data set of 1079 images of 137
individuals. The SVM based recognition was compared with
standard eigenfaces approach using the Nearest Center
94
International Journal of Signal Processing Volume 2 Number 2
Experiments were made on two different face databases
(Yale and AR databases). The results obtained appear in Table
IV. The SVM was used only with polynomial (up to degree 3)
and Guassian kernels (while varying the kernel parameter σ).
potential of SVM to extract the relevant discriminatory
information from the training data irrespective of
representation and pre-processing. In order to achieve this
object, they have designed experiments in which faces are
represented in both Principal Component (PC) and Linear
Discriminant (LD) subspace. The latter basis (Fisherfaces) is
used as an example of a face representation with focus on
discriminatory feature extraction while the former achieves
simply data compression. They also study the effect of image
photometric normalization on performance of the SVM
method, the experimental results showing superior
performance in comparison with benchmark methods.
However, when the representation space already captures and
emphasizes the discriminatory information, SVMs loose their
superiority . The results also indicate that the SVMs are robust
against changes in illumination provided these are adequately
represented in the training data. The proposed system is
evaluated on a large database of 295 people obtaining highly
competitive results: an equal rate of 1% for verification and a
rank-one error rate of 2% for recognition.
In [72], a novel structure is proposed to tackle multi-class
classification problem for a K-class classification test, an array
of K optimal pairwise coupling classifier (O-PWC) is
constructed, each of which is the most reliable and optimal for
the corresponding class in the sense of cross entropy of
square error. The final decision will be got through combining
the results of these K O-PWC. This algorithm is applied on the
ORL face database, which consists of 400 images of 40
individuals, containing quite a high degree of variability in
expression, pose and facial details. The training set included
200 samples (5 for each individual). The remaining 200
samples are used as the test set. The results show that, the
accuracy rate is improved while the computational cost will
not increase too much. Table III shows the comparison of
different recognition methods on ORL database.
TABLE IV
RECOGNITION RATES OBTAINED FOR YALE AND AR IMAGES USING THE
NEAREST MEAN CLASSIFIER (NMC) AND SVM. FOR SVM, A VALUE OF 1000
WAS USED AS MISCLASSIFICATION WEIGHT. THE LAST COLUMN REPRESENTS
Yale
AR
THE RESULTS OBTAINED BY VARYING σ
NMC using
SVM
Euclidean distance
P=1
P=2
P=3
PCA
92.73%
98.79%
98.79%
98.79%
Gaussian
99.39%
ICA
95.76%
99.39%
99.39%
99.39%
99.39%
PCA
48.33%
92%
91.67%
91%
92.67%
ICA
70.33%
93.33%
93.33%
92.67%
94%
A Support Vector Machine based multi-view face detection
and recognition framework is described in [74]. Face
detection is carried out by constructing several detectors, each
of them in charge of one specific view. The symmetrical
property of face images is employed to simplify the
complexity of the modeling. The estimation of head pose,
which is achieved by using the Support Vector Regression
technique, provides crucial information for choosing the
appropriate face detector. This helps to improve the accuracy
and reduce the computation in multi-view face detection
compared to other methods.
For video sequences, further computational reduction can
be achieved by using Pose Change Smoothing strategy. When
face detectors find a face in frontal view, a Support Vector
Machine based multi-class classifier is activated for face
recognition. All the above issues are integrated under a
Support Vector Machine framework. An important
characteristic of this approach is that it can obtain a robust
performance in a poorly constrained environment, especially
for low resolution, large scale changes, and rotation in depth.
Test results on four video sequences are presented, among
them, detection rate is above 95%, recognition accuracy is
above 90%, and the full detection and recognition speed is up
to 4 frames/second on a PentiumII300 PC.
In [75], a new face recognition method, which combines
several SVM classifiers and a NN arbitrator is presented. The
proposed method does not use any explicit feature extraction
scheme. Instead the SVMs receive the gray level values of raw
pixels as the input pattern. The rationale for this configuration
is that a SVM has the capability of learning in highdimensional space, such as gray-level face-image space.
Furthermore, the use of SVMs with a local correlation kernel
(modified form of polynomial kernel method) provides an
effective combination of feature extraction and classification,
thereby eliminating the need for a carefully designed feature
extractor.
The scaling problem that occurs when arbitrating multiple
SVMs is resolved by adopting a NN as a trainable scalier.
From experimental results using the ORL database (see Fig.
3), the proposed method resulted in a 97.9% recognition rate
with an average processing time of 0.22 seconds for a face
TABLE III
RECOGNITION ACCURACY RATE COMPARISON
O-PWC
Max
O-PWC
Method
PWC
(Cross
(Square Error)
Voting
Entropy)
Rate
94%
95.13
96.79%
98.11%
Reference [73] combine SVM and Independent Component
Analysis (ICA) techniques for the face recognition problem.
ICA can be considered as a generalization of Principle
Component Analysis. Fig. 2 shows the difference between
PCA and ICA basis images.
Fig. 2 Some original (left), PCA (center) and ICA (right) basis
images for the Yale Face Database
95
International Journal of Signal Processing Volume 2 Number 2
pattern with 40 classes. Moreover, comparison with other
known results on the same database. Table V shows a
summary of the performance of various systems for which
results using the ORL database are available. The proposed
method showed the best performance and significant reduction
of error rate (44.7%) from the second best performing system–
convolutional NN.
and correlation coefficient classifiers. However, from the
practical point of view the difference is insignificant.
Reference [77] describes an approach for the problem of
face pose discrimination using SVM. Face pose discrimination
means that one can label the face image as one of several
known poses. Face images are drawn from the standard
FERET database, see Fig. 4.
Fig. 3 Sample images obtained from ORL database
TABLE V
ERROR RATES OF VARIOUS SYSTEMS
Method
Eigenfaces
Psudo-2DHMM
Convolutional NN
SVMs with local
correlation kernel
Error rate (%)
10.0
5.0
3.8
2.1
Fig. 4 Examples of (a) training and (b) test images
The training set consists of 150 images equally distributed
among frontal, approximately 33.75o rotated left and right
poses, respectively, and the test set consists of 450 images
again equally distributed among the three different types of
poses. SVM achieved perfect accuracy - 100% discriminating between the three possible face poses on
unseen test data, using either polynomials of degree 3 or
Radial Basis Functions (RBFs) as kernel approximation
functions. Experimental results using polynomial kernels and
RBF kernels are given in Tables VI-VII respectively.
On the other hand, [76] studied SVMs in the context of face
authentication (verification). Their study supports the
hypothesis that the SVM approach is able to extract the
relevant discriminatory information from the training data and
this is the main reason for its superior performance over
benchmark methods. When the representation space already
captures and emphasizes the discriminatory information
content as in the case of Fisherfaces, SVMs loose their
superiority. SVMs can also cope with illumination changes,
provided these are adequately represented in the training data.
However, on data which has been sanitized by feature
extraction (Fisherfaces) and/or normalization, SVMs can get
over-trained, resulting in the loss of the ability to generalize.
The following conclusion can be drawn from their work:
(1) the SVM approach is able to extract the relevant
discriminatory information from the data fully automatically.
It can also cope with illumination changes. The major role in
this characteristic is played by the SVMs ability to learn nonlinear decision boundaries, (2) on data which has been
sanitised by feature extraction (Fisherfaces) and/or
normalization, SVMs can get over-trained, resulting in the
loss of the ability to generalize. (3) SVMs involve many
parameters and can employ different kernels. This makes the
optimization space rather extensive, without the guarantee that
it has been fully explored to find the best solution. (4) a SVM
takes about 5 seconds to train per client (on a Sun Ultra
Enterprise 450). This is about an order of magnitude longer
than determining client-specific thresholds for the Euclidean
TABLE VI
EXPERIMENT RESULTS USING POLYNOMIAL KERNELS
Testing
Number
Training
Testing
Accuracy
Classifiers
Of
Accuracy
Accuracy
Using max.
type
Support
On 150
On 450
output from three
vectors
examples
examples
classifiers
Frontal vs
33
100%
99.33%
others
Left
33.750 vs
25
100%
99.56%
100%
others
Right
37
100%
99.78%
33.750 vs
others
96
International Journal of Signal Processing Volume 2 Number 2
multiple classifiers has emerged over recent years and
represented a departure from the traditional strategy. This
approach goes under various names such as MCS or
committee or ensemble of classifiers, and has been developed
to address the practical problem of designing automatic
pattern recognition systems with improved accuracy.
A parameter-based combined classifier has been developed
in [79] in order to improve the generalization capability and
hence the system performance of face recognition system. A
combination of three LVQ neural networks that are trained on
different parameters proved successful in generalization for
invariant face recognition. The combined classifier resulted in
improved system accuracy compared to the component
classifiers. With only three training faces, the system
performance in the case of the KUFB is 100%.
Reference [80] presents a system for invariant face
recognition. A combined classifier uses the generalization
capabilities of both LVQ and Radial Basis Function (RBF)
neural networks to build a representative model of a face from
a variety of training patterns with different poses, details and
facial expressions. The combined generalization error of the
classifier is found to be lower than that of each individual
classifier. A new face synthesis method is implemented for
reducing the false acceptance rate and enhancing the rejection
capability of the classifier. The system is capable of
recognizing a face in less than one second. The well-known
ORL database is used for testing the combined classifier. In
the case of the ORL database, a correct recognition rate of
99.5% at 0.5% rejection rate is achieved.
Reference [81] represents a face recognition committee
machine (FRCM), which assembles the outputs of various
face recognition algorithms, Eigenface, Fisherface, Elastic
Graph Matching (EGM), SVM and neural network, to obtain a
unified decision with improved accuracy. This FRCM
outperforms all the individuals on average. It achieves 86.1%
on Yale face database and 98.8% on ORL face database.
In [82], a hybrid face recognition method that combines
holistic and feature analysis-based approach using a Markov
random field (MRF) model is presented. The face images are
divided into small patches, and the MRF model is used to
represent the relationship between the image patches and the
patch ID's. The MRF model is first learned from the training
image patches, given a test image. The most probable patch
ID's are then inferred using the belief propagation (BP) HM.
Finally, the ID of the image is determined by a voting scheme
from the estimated patch ID's. This method achieved 96.11%
on Yale face database and 86.95% on ORL face database.
In [83], a combined classifier system consisting of an
ensemble of neural networks is based on varying the
parameters related to the design and training of classifiers.
The boosted algorithm is used to make perturbation of the
training set employing MLP as base classifier. The final result
is combined by using simple majority vote rule. This system
achieved 99.5% on Yale face database and 100% on ORL face
database. To the best of our knowledge, these results are the
best in the literatures.
TABLE VII
EXPERIMENT RESULTS USING RBF KERNELS
Classifiers
type
Frontal vs
others
Left
33.750 vs
others
Right
33.750 vs
others
Number
Of
Support
vectors
Training
Accuracy
On 150
examples
Testing
Accuracy
On 450
examples
47
100%
100%
38
100%
100%
43
100%
100%
Testing
Accuracy
Using max.
output from three
classifiers
100%
Reference [78] presents a method for authenticating an
individual’s membership in a dynamic group without
revealing the individuals and without restricting the group size
and/or the members of the group. They treat the membership
authentication as a two-class face classification problem to
distinguish a small size set (membership) from its
complementary set (non-membership) in the universal set. In
the authentication, the false-positive error is the most critical.
Fortunately, the error can be validly removed by using SVM
ensemble, where each SVM acts as an independent
membership/ non-membership classifier and several SVMs are
combined in a plurality voting scheme that chooses the
classification made by more than half of SVMs.
For a good encoding of face images, the Gabor filtering,
principal component analysis and linear discrimination
analysis have been applied consecutively to the input face
image for achieving effective representation, efficient
reduction of the data dimension and storing separation of
different faces, respectively. Next, the SVM ensemble is
applied to authenticate an input face image whether it is
included in the membership group or not. Experiment results
showed that the SVM ensemble has the ability to recognize
non-membership and a stable robustness to cope with the
variations of either different group sizes or different group
members. The correct authentication rate is almost constant in
the range from 97% to 98.5% without regard to the variation
of members in the group in the same group size.
However, one problem with the proposed authentication
method is that the correct classification rate for the
membership is highly degraded when the size of members is
small (<20), due to the limited training data set. Nevertheless,
simulation results show that the authentication performance of
the proposed method can keep stable for the member group
with a size of less than 50 persons.
C. Multiple Classifier Systems (MCSs)
Recently, MCSs based on the combination of outputs of a
set of different classifiers have been proposed in the field of
face recognition as a method of developing high performance
classification systems.
Traditionally, the approach used in the design of pattern
recognition systems has been to experimentally compare the
performance of several classifiers in order to select the best
one. However, an alternative approach based on combining
97
International Journal of Signal Processing Volume 2 Number 2
III. COMPARISON OF DIFFERENT FACE DATABASES
The FRVT 2002 [86] was a large-scale evaluation of
automatic face recognition technology. The primary objective
of the FRVT 2002 was to provide performance measures for
assessing the ability of automatic face recognition systems to
meet real-world requirements. From a scientific point of view,
FRVT 2002 will have an impact on future directions of
research in the computer vision and pattern recognition,
psychology, and statistics fields.
The heart of the FRVT 2002 was the high computational
intensity test (HCInt). The HCInt consisted of 121,589
operational images of 37,437 people. From this date, realworld performance figures on a very large data set were
computed. Performance statistics were computed for
verification, identification, and watch list tests.
The conclusions from FRVT 2002 are summarized below:
In Section 2, a number of face recognition algorithms have
been described. In Table VIII, we give a comparison of face
databases which were used to test the performance of these
face recognition algorithms. The description and limitations of
each database are given.
While existing publicly-available face databases contain
face images with a wide variety of poses, illumination angles,
gestures, face occlusions, and illuminant colors, these images
have not been adequately annotated, thus limiting their
usefulness for evaluating the relative performance of face
detection algorithms. For example, many of the images in
existing databases are not annotated with the exact pose angles
at which they were taken.
In order to compare the performance of various face
recognition algorithms presented in the literature there is need
for a comprehensive, systematically annotated database
populated with face images that have been captured (1) at
variety of pose angles (to permit testing of pose invariance),
(2) with a wide variety of illumination angles (to permit
testing of illumination invariance), and (3) under a variety of
commonly encountered illumination color temperatures
(permit testing of illumination color invariance).
Reference [84] presents a methodology for creating such an
annotated database that employs a novel set of apparatus for
the rapid capture of face images from a wide variety of pose
angles and illumination angles. Four different types of
illumination are used, including daylight, skylight,
incandescent and fluorescent. The entire set of images, as well
as the annotations and the experimental results, is being
placed in the public domain, and made available for download
over the worldwide web [85].
• Indoor face recognition performance has substantially
improved since FRVT 2000.
• Face recognition performance decreases approximately
linearly with elapsed time, database and new images.
• Better face recognition systems do not appear to be
sensitive to normal indoor lighting changes.
• Three-dimensional morphable models substantially
improve the ability to recognize non-frontal faces.
• On FRVT 2002 imagery, recognition from video
sequences was not better than from still images.
• Males are easier to recognize than females.
• Younger people are harder to recognize than older people.
• Outdoor face recognition performance needs improvement.
• For identification and watch list tests, performance
decreases linearly in the logarithm of the database or watch
list size.
V. SUMMARY OF THE RESEARCH RESULTS
IV. THE FACE RECOGNITION-VENDOR TEST (FRVT)
In Table IX, a summary of performance evaluations of face
recognition algorithms on different databases is given.
TABLE VIII
COMPARISON OF DIFFERENT FACE DATABASES
Database
AT&T [87]
(formerly ORL)
Oulu Physics [88]
XM2VTS [89]
Description
contains face images of 40 persons, with 10 images of each. For most subjects,
the 10 images were shot at different times and with different lighting
conditions, but always against a dark background.
includes frontal color images of 125 different faces. Each face was
photographed 16 times, using 1 of 4 different illuminants (horizon,
incandescent, fluorescent, and daylight) in combination with 1 of 4 different
camera calibrations (color balance settings). The images were captured under
dark room conditions, and a gray screen was placed behind the participant.
The spectral reflectance (over the range from 400 nm to 700 nm) was
measured at the forehead, left cheek, and right cheek of each person with a
spectrophotometer. The spectral sensitivities of the R, G and B channels of
the camera, and the spectral power of the four illuminants were also recorded
over the same spectral range.
consists of 1000 GBytes of video sequences and speech recordings taken of
295 subjects at one-month intervals over a period of 4 months (4 recording
sessions). Significant variability in appearance of clients (such as changes of
hairstyle, facial hair, shape and presence or absence of glasses) is present in
the recordings. During each of the 4 sessions a “speech” video sequence and a
“head rotation” video sequence was captured. This database is designed to test
systems designed to do multimodal (video + audio) identification of humans
by facial and voice features.
98
Limitation
(1) limited number of people (2) illumination conditions
are not consistent from image to image. (3) the images are
not annotated for different facial expressions, head
rotation, or lighting conditions.
(1) although this database contains images captured under
a good variety of illuminant colors, and the images are
annotated for illuminant, there are no variations in the
lighting angle.
(2) all of the face images are basically frontal (with some
variations in pose angle and distance from the camera)
it does not include any information about the image
acquisition parameters, such as illumination angle,
illumination color, or pose angle.
International Journal of Signal Processing Volume 2 Number 2
Database
Yale [90]
Yale B [91]
MIT [92]
CMU Pose,
Illumination, and
Expression (PIE)
[93]
UMIST [94]
Bern University face
database [63]
Purdue AR [64]
The University of
Stirling online database
[95]
The FERET [96]
Kuwait University face
database (KUFDB )
[97]
Description
contains frontal grayscale face images of 15 people, with 11 face images of
each subject, giving a total of 165 images. Lighting variations include left-light,
center-light, and right-light. Spectacle variations include with-glasses and
without-glasses. Facial expression variations include normal, happy, sad,
sleepy, surprised, and wink.
Limitation
(1) limited number of people
(2) while the face images in this database were taken with
3 different lighting angles (left, center, and right) the
precise positions of the light sources are not specified.
(3) since all images are frontal, there are no pose angle
variations. (4) Environmental factors (such as the presence
or absence of ambient light) are also not described.
contains grayscale images of 10 subjects with 64 different lighting angles and (1) limited number of Subjects.
9 different poses angles, for a total of 5760 images. Pose 0 is a frontal view, in (2) the background in these images is not homogeneous,
which the subject directs his/her gaze directly into the camera lens. In poses 1, and is cluttered. (3) The 9 different pose angles in these
2, 3, 4, and 5 the subject is gazing at 5 points on a semicircle about 12 degrees images were not precisely controlled. Where the exact head
away from the camera lens, in the left visual field. In poses 6, 7, and 8 the orientation (both vertically and horizontally) for each pose
subject is gazing at 3 different points on a semicircle about 24 degrees away was chosen by the subject.
from the camera lens, again in the left visual field. The images were captured
with an overhead lighting structure which was fitted with 64 computercontrolled xenon strobe lights. For each pose, 64 images were captured of
each subject at a rate of 30 frames/sec, over a period of about 2 seconds.
Contains 16 subjects. Each subject sat on a couch and was photographed 27
times, while varying head orientation. The lighting direction and the camera
zoom were also varied during the sequence. The resulting 480 x 512 grayscale
images were then filtered and sub sampled by factors of 2, to produce six
levels of a binary Gaussian pyramid. The six “pyramid levels” are annotated
by an X-by-Y pixel count, which ranged from 480x512 down to 15x16.
contains images of 68 subjects that were captured with 13 different poses, 43
different illumination conditions, and 4 different facial expressions, for a total
of 41,368 color images with a resolution of 640 x 486. Two sets of images
were captured – one set with ambient lighting present, and another set with
ambient lighting absent.
consists of 564 grayscale images of 20 people of both sexes and various races.
(Image size is about 220 x 220.) Various pose angles of each person are
provided, ranging from profile to frontal views.
contains frontal views of 30 people. Each person has 10 gray-level images
with different head pose variations (two front parallel pose, two looking to the
right, two looking to the left, two looking downwards, and two looking
upwards). All images are taken under controlled/ideal conditions.
contains over 4,000 color frontal view images of 126 people's faces (70 men
and 56 women) that were taken during two different sessions separated by 14
days. Similar pictures were taken during the two sessions. No restrictions on
clothing, eyeglasses, make-up, or hair style were imposed upon the
participants. Controlled variations include facial expressions (neutral, smile,
anger, and screaming), illumination (left light on, right light on, all side lights
on), and partial facial occlusions (sun glasses or a scarf).
was created for use in psychology research, and contains pictures of faces,
objects, drawings, textures, and natural scenes. A web-based retrieval system
allows a user to select from among the 1591 face images of over 300 subjects
based on several parameters, including male, female, grayscale, color, profile
view, frontal view, or 3/4 view.
contains face images of over 1000 people. It was created by the FERET
program, which ran from 1993 through 1997. The database was assembled to
support government monitored testing and evaluation of face recognition
algorithms using standardized tests and procedures. The final set of images
consists of 14051 grayscale images of human heads with views that include
frontal views, left and right profile views, and quarter left and right views. It
contains many images of the same people taken with time-gaps of one year or
more, so that some facial features have changed. This is important for
evaluating the robustness of face recognition algorithms over time.
The in-house built database consists of 250 face acquired from 50 people with
five images per face. There is a total 250 gray level images (5 images x 50
people). Facial images are normalized to sizes 24 x 24, 32 x 32, and 64 x 64).
Images were acquired without any control of the laboratory illumination.
Variations in lighting, facial expression, size, and rotation, are considered.
99
(1) Although this database contains images that were
captured with a few different scale variations, lighting
variations, and pose variations, these variations were not
very extensive, and were not precisely measured. (2)
There was also apparently no effort made to prevent the
subjects from moving between pictures.
(1) there was clutter visible in the backgrounds of these
images.
(2) The exact pose angle for each image is not specified.
(1) No absolute pose angle is provided for each image.
(2) No information is provided about the illumination used
– either its direction or its color temperature.
(1) limited number of subjects.
(2) the exact pose angle for each image is not specific.
(3) there is not variation in illumination conditions.
the placement of those light sources, the color temperature
of those light sources, and whether they were diffuse of
point light sources is not specified. (The placement of the
two light sources produces objectionable glare in the
spectacles of some subjects.)
(1) no information is provided about the illumination used
during the image capture. (2) Most of these images were
also captured in front of a black background, making it
difficult to discern the boundaries of the head of those
subjects with dark hair.
(1) it does not provide a very wide variety of pose
variations.
(2) there is no information about the lighting used to
capture the images.
(1) limited number of people.
(2) it does not include any information about image
acquisition parameter, such as pose angle.
International Journal of Signal Processing Volume 2 Number 2
TABLE IX
SUMMARY OF THE RESEARCH RESULTS
Database
References
Method
Percentage of correct classification(PCC)
[31]
Eigenfeatures
95%
[28]
Eigenface
[42]
Graph matching
[50]
Geometrical feature
matching and Template
matching
FERET
[68]
AR
[70]
SVM + 3D morphable
model
[71]
SVM+PC+LD
[61]
LEM
SVM+PCA
SVM+ICA
SVM+PCA
SVM+ICA
[73]
[73]
[81]
Yale
[82]
[83]
ORL
SVM
Build face recognition
committee machine
(FRCM) of Eigenface,
Fisherface, Elastic Graph
Matching (EGM), SVM,
and Neural network
Combines holistic and
feature analysis-based
approaches using a
Markov random field
(MRF) method
Boosted parameter-based
combined classifier
Notes
This method would be less sensitive to appearance changes than
standard eigenface method. The DB contained 7,562 images of
approximately 3,000 individuals.
95% , 85% , 64% correct classifications
DB contained 2,500 images of 16 individuals; the images include
averaged over lighting, orientation, and
a large quality of background area.
size variation, respectively.
86.5% and 66.4% for the matching tests of
111 faces of 15 degree rotation and 110
faces of 30 degree rotation to a gallery of
112 neutral frontal views
Template matching achieved 100% to
These two matching algorithms occurred on the same DB which
90% for Geometrical feature matching.
contained 188 images of 47 individuals.
Identification performance is 77.78%
versus 54% for PCA. Verification
performance is 93%versus 87% for PCA.
Face rotation up to ±360 in depth.
98%
99% for verification and 98%
recognition.
96.43%
92.67%
94%
for
99.39%
99.39%
DB contained 295 people.
DB contained frontal faces under controlled condition.
SVM was used only with polynomial (up to degree 3) and
Guassian kernel.
DB contained 165 images of 15 individuals. The DB divided into
90 images (6 for each person) for training and 75 for testing (5 for
each person)
1) They adopt leaving-one-out cross validation method.
FRCM gives 86.1% and it outperforms all
2) Without the lighting variations, FRCM achieves 97.8%
the individuals on average
accuracy.
96.11 (when using 5 images for training
and 6 for testing)
99.5%
[37]
Hybrid NN: SOM+a
convolution NN
96.2%
[44]
Hidden Markov model
(HMMs)
87%
[44]
A pseudo 2DHMM
[76]
SVM with a binary tree
[72]
Optimal-Pairwise
coupling (O-PWC) SVM
[75]
Several SVM+NN
arbitrator
[28]
Eigenface
[39]
PDBNN
[80]
A combined classifier
uses the generalization
capabilities of both
Learning Vector
Quantization (LVQ) and
Radial Basis Function
(RBF) neural networks to
build a representative
model of a face from a
variety of training
patterns with different
poses, details and facial
expressions
They tested the recognition accuracy with different numbers of
training samples. K(k=1,2,…10) images of each subject were
randomly selected for training and the remaining 11-k images for
testing
The DB is divided into 75 images (5 for each person) for training
and 90 for testing (6 for each person)
DB contained 400 images of 40 individuals. The classification
time less than .5 second for recognizing one facial image, but
training time is 4 hours.
Its classification time and training time were not given (believe to
be very expensive.)
91.21% for SVM and 84.86% for Nearst They compare the SVMs with standard eigenface approach using
Center Classification (NCC)
the NCC
PWC achieved 95.13% ,
O-PWC
They select 200 samples (5 for each individual) randomly as
(cross entropy) achieved 96.79% and Otraining set. The remaining 200 samples are used as the test set.
PWC (square error) achieved 98.11%
An average processing time of .22 second for face pattern with 40
97.9%
classes. On the same DB, PCC for eigenfaces is 90% and for
pseudo-2D HMM is 95% and for convolutional NN is 96.2%
90%
PDBNN face recognizing up to 200 people in approximately 1
96%
second and the training time is 20 minutes.
95%
99.5%
100
A new face synthesis method is implemented for reducing the
false acceptance rate and enhancing the rejection capability of the
classifier. The system is capable of recognizing a face in less than
one second.
International Journal of Signal Processing Volume 2 Number 2
Database
Bern
University
face database
Kuwait
University
face database
(KUFDB )
References
Method
[81]
Build face recognition
committee machine
(FRCM) of Eigenface,
Fisherface, Elastic Graph
Matching (EGM), SVM,
and Neural network
Percentage of correct classification(PCC)
[82]
MRF
[83]
Boosted parameter-based
combined classifier
[61]
LEM
100%
[79]
Combined LVQ neural
network
100%
FRCM gives 98.8% and it outperforms all
They adopt leaving-one-out cross validation method.
the individual on average
86.95 (when using 5 images for training
and 6 for testing)
100%
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
Notes
The KUFDB includes 250 images acquired from 50 people with
five images/person. The training set has 3 images x 50 subjects
and the testing set has 2 images x 50 subjects.
[18]
Francis Galton, “Personal identification and description,” In Nature,
pp. 173-177, June 21, 1888.
W. Zaho, “Robust image based 3D face recognition,” Ph.D. Thesis,
Maryland University, 1999.
R. Chellappa, C.L. Wilson and C. Sirohey, ”Humain and machine
recognition of faces: A survey,” Proc. IEEE, vol. 83, no. 5, pp. 705740, may 1995.
T. Fromherz, P. Stucki, M. Bichsel, “A survey of face recognition,”
MML Technical Report, No 97.01, Dept. of Computer Science,
University of Zurich, Zurich, 1997.
T. Riklin-Raviv and A. Shashua, “The Quotient image: Class based
recognition and synthesis under varying illumination conditions,” In
CVPR, P. II: pp. 566-571,1999.
G.j. Edwards, T.f. Cootes and C.J. Taylor, “Face recognition using
active appearance models,” In ECCV, 1998.
T. Sim, R. Sukthankar, M. Mullin and S. Baluja, “Memory-based face
recognition for vistor identification,” In AFGR, 2000.
T. Sim and T. Kanade, “Combing models and exemplars for face
recognition: An illuminating example,” In Proceeding Of Workshop on
Models Versus Exemplars in Computer Vision, CUPR 2001.
L. Sirovitch and M. Kirby, “Low-dimensional procedure for the
characterization of human faces,” Journal of the Optical Society of
America A, vol. 2, pp. 519–524, 1987.
M. Turk and A. Pentland “Face recognition using eigenfaces,” In Proc.
IEEE Conference on Computer Vision and Pattern Recognition, pp.
586–591, 1991.
P. Belhumeur, P. Hespanha, and D. Kriegman, “Eigenfaces vs
fisherfaces: Recognition using class specific linear projection,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 19,
no. 7, pp. 711–720, 1997.
M. Fleming and G. Cottrell, “Categorization of faces using
unsupervised feature extraction,” In Proc. IEEE IJCNN International
Joint Conference on Neural Networks, pp. 65–70, 1990.
B. Moghaddam, W. Wahid, and A. Pentland, “Beyond eigenfaces:
Probabilistic matching for face recognition,” In Proc. IEEE
International Conference on Automatic Face and Gesture Recognition,
pp. 30–35, 1998.
A. Lanitis, C. Taylor, and T. Cootes, “Automatic interpretation and
coding of face images using flexible models,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 743 –
756, 1997.
K. Jonsson, J. Matas, J. Kittler, and Y. Li, “Learning support vectors
for face verification and recognition,” In Proc. IEEE International
Conference on Automatic Face and Gesture Recognition, pp. 208–213,
2000.
R. Brunelli and T. Poggio, “Face recognition: Features versus
templates,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 15, no. 10, pp. 1042–1052, 1993.
D. J. Beymer, “Face recognition under varying pose,” A.I. Memo 1461,
Center for Biological and Computational Learning, M.I.T., Cambridge,
MA, 1993.
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
101
The DB is divided into 200 images (10 for each person) for
training and 200 for testing (10 for each person)
L. Wiskott, J.-M. Fellous, N. Kruger, and C. von der Malsburg, “Face
recognition by elastic bunch graph matching,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 775 –
779, 1997.
A. Nefian and M. Hayes, ” An embedded hmm-based approach for face
detection and recognition,” In Proc. IEEE International Conference on
Acoustics, Speech, and Signal Processing, vol. 6, pp. 3553–3556,
1999.
U.S. Department of Defense, ”Facial Recognition Vendor Test, 2000,”
Available:
http://www.dodcounterdrug.com/facialrecognition/FRVT2000/frvt200
0.htm.
W. Zhao and R. Chellappa, “ Robust face recognition using symmetric
shape-from-hading,” Technical Report CARTR -919, 1999., Center for
Automation Research, University of Maryland, College Park, MD,
1999.
L. Zheng, “A new model-based lighting normalization algorithm and
its application in face recognition,” Master’s thesis, National
University of Singapore, 2000.
G.J. Edwards, T.F. Cootes, and C.J. Taylor, “Face recognition using
active appearance models,” In ECCV, 1998.
A.S. Georghiades, P.N. Belhumeur, and D.J. Kriegman, “From few to
many: Generative models for recognition under variable pose and
illumination,” In AFGR, 2000.
D.B. Graham and N.M., “Allinson face recognition from unfamiliar
views: Subspace methods and pose dependency,” In AFGR, 1998.
L. Sirovich and M. Kirby, “Low-Dimensional procedure for the
characterisation of human faces,” J. Optical Soc. of Am., vol. 4, pp.
519-524, 1987.
M. Kirby and L. Sirovich, “Application of the Karhunen- Loève
procedure for the characterisation of human faces,” IEEE Trans.
Pattern Analysis and Machine Intelligence, vol. 12, pp. 831-835, Dec.
1990.
M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cognitive
Neuroscience, vol. 3, pp. 71-86, 1991.
M.A. Grudin, “A compact multi-level model for the recognition of
facial images,” Ph.D. thesis, Liverpool John Moores Univ., 1997.
L. Zhao and Y.H. Yang, “Theoretical analysis of illumination in pcabased vision systems,” Pattern Recognition, vol. 32, pp. 547-564,
1999.
A. Pentland, B. Moghaddam, and T. Starner, “View-Based and
modular eigenspaces for face recognition,” Proc. IEEE CS Conf.
Computer Vision and Pattern Recognition, pp. 84-91, 1994.
K. Chang, K.W. Bowyer and S. Sarkar, “Comarison and combination
of ear and face images in appearance-based biometrics,” IEEE Trans.
On Pattern analysis and machine intelligence, vol. 25, no. 9,
September 2003.
L. Hong and A. Jain, “Integrating faces and fingerprints for personal
identification,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 20, no. 12, pp. 1295-1307, Dec. 1998.
International Journal of Signal Processing Volume 2 Number 2
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
P. Verlinde, G. Matre, and E. Mayoraz, “Decision fusion using a multilinear classifier,” Proc. Int’l Conf. Multisource-Multisensor
Information Fusion, vol. 1, pp. 47-53, July 1998.
T.J. Stonham, “Practical face recognition and verification with
WISARD,” Aspects of Face Processing, pp. 426-441, 1984.
K.K. Sung and T. Poggio, “Learning human face detection in cluttered
scenes,” Computer Analysis of Image and patterns, pp. 432-439, 1995.
S. Lawrence, C.L. Giles, A.C. Tsoi, and A.D. Back, “Face recognition:
A convolutional neural-network approach,” IEEE Trans. Neural
Networks, vol. 8, pp. 98-113, 1997.
J. Weng, J.S. Huang, and N. Ahuja, “Learning recognition and
segmentation of 3D objects from 2D images,” Proc. IEEE Int'l Conf.
Computer Vision, pp. 121-128, 1993.
S.H. Lin, S.Y. Kung, and L.J. Lin, “Face recognition/detection by
probabilistic decision-based neural network,” IEEE Trans. Neural
Networks, vol. 8, pp. 114-132, 1997.
S.Y. Kung and J.S. Taur, “Decision-Based neural networks with
signal/image classification applications,” IEEE Trans. Neural
Networks, vol. 6, pp. 170-181, 1995.
M. Lades, J.C. Vorbruggen, J. Buhmann, J. Lange, C. Von Der
Malsburg, R.P. Wurtz, and M. Konen, “Distortion Invariant object
recognition in the dynamic link architecture,” IEEE Trans. Computers,
vol. 42, pp. 300-311, 1993.
L. Wiskott and C. von der Malsburg, “Recognizing faces by dynamic
link matching,” Neuroimage, vol. 4, pp. 514-518, 1996.
F. Samaria and F. Fallside, “Face identification and feature extraction
using hidden markov models,” Image Processing: Theory and
Application, G. Vernazza, ed., Elsevier, 1993.
F. Samaria and A.C. Harter, “Parameterisation of a stochastic model
for human face identification,” Proc. Second IEEE Workshop
Applications of Computer Vision, 1994.
S. Tamura, H. Kawa, and H. Mitsumoto, “Male/Female identification
from 8_6 very low resolution face images by neural network,” Pattern
Recognition, vol. 29, pp. 331-335, 1996.
T. Kanade, “Picture processing by computer complex and recognition
of human faces,” technical report, Dept. Information Science, Kyoto
Univ., 1973.
A.J. Goldstein, L.D. Harmon, and A.B. Lesk, “Identification of human
faces,” Proc. IEEE, vol. 59, pp. 748, 1971.
Y. Kaya and K. Kobayashi, “A basic study on human face
recognition,” Frontiers of Pattern Recognition, S. Watanabe, ed., pp.
265, 1972.
R. Bruneli and T. Poggio, “Face recognition: features versus
templates,” IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 15, pp. 1042-1052, 1993.
I.J. Cox, J. Ghosn, and P.N. Yianios, “Feature-Based face recognition
using mixture-distance,” Computer Vision and Pattern Recognition,
1996.
B.S. Manjunath, R. Chellappa, and C. von der Malsburg, “A Feature
based approach to face recognition,” Proc. IEEE CS Conf. Computer
Vision and Pattern Recognition, pp. 373-378, 1992.
R.J. Baron, “Mechanism of human facial recognition,” Int'l J. Man
Machine Studies, vol. 15, pp. 137-178, 1981.
M. Bichsel, “Strategies of robust object recognition for identification
of human faces,” Ph.D. thesis, Eidgenossischen Technischen
Hochschule, Zurich, 1991.
T. Vetter and T. Poggio, "Linear object classes and image synthesis
from a single example image," IEEE Trans. Pattern Analysis and
Machin Intelligence, Vol. 19, no. 7, pp. 733-742, July 1997.
D. Beymer and T. Poggio, “Face recognition from one model view,”
Proc. Fifth Int’l Conf. Computer Vision, 1995.
T. Vetter and V. Blanz, “Estimating coloured 3D face models from
fingle images: An example based approach,” Proc. Conf. Computer
Vision (ECCV ’98), vol. II, 1998.
A.S. Georghiades, P.N. Belhumeur, and D.J. Kriegman, “From few to
many: Illumination cone models for face recognition under variable
lighting and pose,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 23, no. 6, pp. 643-660, 2001.
W. Zhao and R. Chellappa, “SFS based view synthesis for robust face
recognition,” Proc. Int’l Conf. Automatic Face and Gesture
Recognition, pp. 285-292, 2000.
[60]
[61]
[62]
[63]
[64]
[65]
[66]
[67]
[68]
[69]
[70]
[71]
[72]
[73]
[74]
[75]
[76]
[77]
[78]
[79]
[80]
[81]
[82]
[83]
[84]
102
V. Blanz and T. Vetter, “Face recognition based on fitting a 3D
morphable model,” IEEE Trans. On Pattern Analysis and Machine
Intelligence, vol. 25, no. 9, September 2003.
B. Takács, “Comparing face images using the modified hausdorff
distance,” Pattern Recognition, vol. 31, pp. 1873-1881, 1998.
Y. Gao and K.H. Leung, “Face recognition using line edge map,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 24,
no. 6, June 2002.
M.K.H. Leung and Y.H. Yang, “Dynamic two-strip algorithm in curve
fitting,” Pattern Recognition, vol. 23, pp. 69-79, 1990.
Bern Univ. Face Database,
ftp://iamftp.unibe.ch/pub/Images/FaceImages/, 2002.
Purdue Univ. Face Database,
http://rvl1.ecn.purdue.edu/~aleix/aleix_face_DB.html, 2002.
V.N. Vapnik, “The nature of statistical learning theory,” New York:
Springverlag, 1995.
C.J. Lin, “On the convergence of the decomposition method for
support vector machines,” IEEE Transactions on
Neural
Networks,2001.
G. Guo, S.Z. Li, and K. Chan, “Face recognition by support vector
machines,” In proc. IEEE International Conference on Automatic Face
and Gesture Recognition, pp. 196-201, 2000.
P.J. Phillips, “Support vector machines applied to face recognition,”
Processing system 11, 1999.
B. Heisele, P. Ho, and T. Poggio, “Face recognition with support
vector machines: Global versus component-based approach,” in
International Conference on Computer Vision (ICCV'01), 2001.
J. Huang, V. Blanz, and B. Heisele, “Face recognition using
Component-Based support vector machine Classification and
Morphable models,” LNCS 2388, pp. 334-341, 2002.
K.Jonsson, J. Mates, J. Kittler and Y.P. Li, “Learning support vectors
for face verification and recognition,” Fourth IEEE International
Conference on Automatic Face and Gesture Recognition 2000, pp.
208-213, Los Alamitos, USA, March 2000.
G.D. Guo, H.J. Zhang, S.Z. Li. "Pairwise face recognition". In
Proceedings of 8th IEEE International Conference on Computer
Vision. Vancouver, Canada. July 9-12, 2001.
O. Deniz, M. Castrillon, M. Hernandez, “Face recognition using
independent component analysis and support vector machines,”
Pattern Recognition Letters, vol. 24, pp. 2153-2157, 2003.
Y. Li, S. Gong and H. Liddell. Support vector regression and
classification based multi-view face detection and recognition. In Proc.
IEEE International Conference on Face and Gesture Recognition,
Grenoble, France, March 2000.
K. I. Kim, K. Jung, and J. Kim, “Face recognition using support vector
machines with local correlation kernels,” International Journal of
Pattern Recognition and Artificial Intelligence, vol. 16 no. 1, pp. 97111, 2002.
K. Jonsson, J. Kittler, Y. P. Li, and J. Matas, “Support vector machines
for face authentication,” in T. Pridmore and D. Elliman, editors, British
Machine Vision Conference, pp. 543–553, 1999.
Huang J., X. Shao, and H. Wechsler, "Face pose discrimination using
support vector machines,” 14th International Conference on Pattern
Recognition, (ICPR), Brisbane, Queensland, Aus, 1998.
S. Pang, D. Kim, S.Y. Bang, “Membership authentication in the
dynamic group by face classification using SVM ensemble,” Pattern
Recognition Letters vol. 24, pp. 215-225, 2003.
A.S. Tolba, " A parameter–based combined classifier for invariant face
recognition," Cybernetics and Systems, vol. 31, pp. 289-302, 2000.
A.S. Tolba, and A.N. Abu-Rezq, "Combined classifiers for invariant
face recognition," Pattern Anal. Appl. Vol. 3, no. 4, pp. 289-302, 2000.
Ho-Man Tang, Michael Lyu, and Irwin King, "Face recognition
committee machine," In Proceedings of IEEE International Conference
on Acoustics, Speech, and Signal Processing (ICASSP 2003), pp. 837840, April 6-10, 2003.
Rui Huang, Vladimir Pavlovic, and Dimitris N. Metaxas, "A hybrid
face recognition method using Markov random fields," ICPR (3) , pp.
157-160, 2004.
A.S. Tolba, A.H. El-Baz, and A.A. El-Harby, "A robust boosted
parameter- based combined classifier for pattern recognition,"
submitted for publication.
John A. Black, M. Gargesha, K. Kahol, P. Kuchi, Sethuraman
Panchanathan,” A Framework for performance evaluation of face
International Journal of Signal Processing Volume 2 Number 2
[85]
[86]
[87]
[88]
[89]
[90]
[91]
[92]
[93]
[94]
[95]
[96]
[97]
recognition algorithms,” in Proceedings of the International
Conference on ITCOM, Internet Multimedia Systems II, 2002.
The FacePix reference image set is in the public domain, Available:
http://cubic.asu.edu/vccl/imagesets/facepix.
P.J. Phillips, P. Grother, R.J. Michaels, D.M. Blackburn, E. Tabassi,
and M. Bone, “Face recognition Vendor Test 2002: Evaluation
Report,” NISTIR 6965, NAT. Inst. Of Standards and Technology 2003.
The At & T Database of Faces , Available:
http://www.uk.research.att.com/facedatabase.html
The Oulu Physics database, Available:
http://www.ee.oulu.fi/research/imag/color/pbfd.html
The XM2VTS database, Available:
http://www.ee.surrey.ac.uk/Reseach/VSSP/xm2vtsdb/
The Yale database, Available: http://cvc.yale.edu/
The Yale B database, Available:
http://cvc.yale.edu/projects/yalefacesB/yalefacesB.html
The MIT face database, Available:
ftp://whitehapel.media.mit.edu/pub/images/
The CMU PIE database, Available:
http://www.ri.cmu.edu/projects/project_418.html
The UMIST database, Available:
http://images.ee.umist.ac.uk/danny/database.html
The University of Stirling online database , Available:
http://pics.psych.stir.ac.uk/
The FERET database, Available:
http://www.it1.nist.gov/iad/humanid/feret/
Kuwait University Face Database, Available:
http://www.sc.kuniv.edu.kw/lessons/9503587/dina.htm
A.S. Tolba received his B.Sc. with honors and M.Sc.
from Mansoura University (Egypt) in 1978 and 1981,
respectively. He received his Ph.D. from Wuppertal
University (Germany) in 1988. Since 2000, he has been
a full professor of computer engineering at University
of Suez-Canal (Egypt).
He was Secondment at the Department of Applied
Physics at Kuwait University. He is currently a dean of
the of the Faulty of Computer and Information systems
at Mansoura University (Egypt). He has done research in computer vision,
biometric identification, human-computer interaction, autonomous vehicles,
neural networks, and laser interferometry. He has published over 50 papers in
these areas. He is coauthor of two edited books: "Intelligent Robotic
Systems," (Marcel Dekker, New York, 1991) and "Laser Technology and its
Application," (Publication of ISESCO, 1997). His most recent research focus
is face/gesture recognition. Prof. Tolba is a member of the IEEE and AMSE.
He is currently a member of the editorial bard of Modeling, Measurement, and
Control.
A.H. El-Baz received his B.Sc. with honors and
M.Sc. from Mathematical Department, Damietta
Faculty of Science, New Damietta, Egypt in 1997 and
2002, respectively. He is working toward the Ph.D. at
Mansoura University, Egypt.
His research interest is in the area of pattern
recognition, signal/image processing, and computer
vision, especially automated face recognition.
vision, image retrieval.
A.H. El-Harby received his B.Sc. and M.Sc. degrees
from computer science Department, Suez-Canal
University, Egypt and Mathematical Department,
Damietta Faculty of Science, New Damietta, Egypt,
respectively. He received his Ph.D. degree in
computer engineering from Keele University, UK.
His thesis is on automatic extraction of vector
representations of line features from remotely sensed
images. His research interest include remote sensing,
image processing, pattern recognition, computer
103
View publication stats