Face Detection and Smile Detection
Face Detection and Smile Detection
1 2
Yu-Hao Huang(黃昱豪), Chiou-Shann Fuh (傅楸善)
1
Dept. of Computer Science and Information Engineering, National Taiwan University
E-mail:r94013@csie.ntu.edu.tw
2
Dept. of Computer Science and Information Engineering, National Taiwan University
E-mail:fuh@csie.ntu.edu.tw
2.2.2. AdaBoost
There will be a large number of rectangle features with
different sizes. For example, for a 24 by 24 pixel image,
there are 160,000 features. Adaboost is a machine-
learning algorithm used to find the T best classifiers
with minimum error. To obtain the T classifiers, we
will repeat the following algorithm for T iterations:
3: left mouth corner 64.68 78.38 2.99 4.15 Finally we have the linear equation:
and low misdetection rate and high false alarm rate
2
I x u I x I y v I x I t with small Tsmile. We run different Tsmile in FGNET
R R R database and results are shown in Table 2. We use 0.55
Dstd = 3.014 pixels as our standard Tsmile to have 11.5%
2
I x I y u I y v I y I t misdetection rate and 12.04% false alarm rate.
R R R Misdetection False Alarm
By solving the linear equation, we can obtain optical Threshold
Rate Rate
flow vector (u, v) for (x, y). We use the concept of 0.4*Dstd 6.66% 19.73%
Lucas and Kanade [8] to iteratively solve the (u, v). It is
similar to Newton’s method. 0.5*Dstd 9.25% 14.04%
1. Choose a (u, v) arbitrarily, and shift the (x, y) to (x 0.55*Dstd 11.50% 12.04%
+ u, y + v) and calculate the relative Ix and Iy.
0.6*Dstd 13.01% 8.71%
2. Solve the new (u’, v’) and update (u, v) to (u + u’,
v + v’). 0.7*Dstd 18.82% 4.24%
3. Repeat Step 1 until (u’, v’) converges. 0.8*Dstd 25.71% 2.30%
To have fast feature point tracking, we build the
pyramid images of the current and previous frames Table 2 Misdetection rate and false alarm rate with
with four levels. At each level we search the different thresholds.
corresponding point in a window size 10 by 10 pixels
and stop the search to get into next level with accuracy 5. REAL-TIME SMILE DETECTION
of 0.01 pixels.
It is important to note that the feature tracking will
4. SMILE DETECTION SCHEME accumulate errors as time goes by and that would lead
to misdetection or false alarm results. Since we do not
We have proposed a fast and generally low want users to take an initial neutral photograph every
misdetection and low false alarm video-based method few seconds, which would be annoying and unrealistic.
of smile detector. We have 11.5% smile misdetection Moreover, it is difficult to identify the timing to refine
rate and 12.04% false alarm rate on the FGNET feature position. If the user is performing some facial
database. Our smile detect algorithm is as follows: expression when we refine the feature location, it would
1. Detect the first human face in the first image frame lead us to a wrong point to track. Here we propose a
and locate the twenty standard facial features method to automatically refine for real-time usage.
position. Section 5.1 would describe our algorithm and Section
2. In every image frame, use optical flow to track the 5.2 would show some experiments.
position of left mouth corner and right mouth
corner with accuracy of 0.01 pixels and update the 5.1. Feature Refinement
standard facial feature position by face tracking
and detection. From our very first image, we have user’s face images
3. If x direction distance between the tracked left with neutral facial expression. We would build user’s
mouth corner and right mouth corner is larger than mouth pattern grey image at that time. The mouth
the standard distance plus a threshold Tsmile, then rectangle is surrounded by four feature points: right
we claim a smile detected. mouth corner, center point of upper lip, left mouth
4. Repeat from Step 2 to Step 3. corner, center point of lower lip. Actually we would
In the smile detector application, we strongly expand the rectangle wider and higher to one standard
consider that x direction distance between the right deviation in each direction. Figure 13 shows the user’s
mouth corner and left mouth corner plays an important face and Figure 14 shows the mouth pattern image. For
role in the human smile action. We do not consider y each following image, we use normalized cross
direction displacement. Since the user can have little up correlation (NCC) block matching method to calculate
or down head rotation and that will falsely alarm our the best matching block to the pattern image around the
detector. How to decide our Tsmile threshold? As shown new mouth region and calculate their cross correlation
in Table 1, we have mean distance 29.98 pixels value. The NCC equation is:
between left mouth corner and right mouth corner and
their standard deviation value 2.49 and 2.99 pixels. Let
( f ( x, y) f )( g (u, v) g )
( x , y )R ,( u ,v )R '
Dmean be 29.98 pixels and Dstd be 2.49 + 2.99 = 5.48
C
2 2
pixels. In each frame, let Dx be x distance between two ( f ( x, y ) f )
( x , y )R
( g (u , v ) g )
( u ,v )R '
mouth corners. If Dx is greater than Dmean + Tsmile, then
it is a smile, otherwise, it is not. With large Tsmile, we The equation shows the cross correlation between
have high misdetection rate and low false alarm rate, two blocks R and R’. If the correlation value is larger
than some threshold, which we would describe more
clearly later, it means the mouth state is very close to
the neutral one rather than an open mouse, a smile
mouth or other state. Then we would relocate feature
positions. To not take too much computation time on
finding match block, we set the search region center by
initial position. To overcome the non-sub pixel block
matching, we set the search range to a three by three Cross correlation value 0.925
block and find the largest correlation value as our
results.
5.2. Experiment
Cross correlation value 0.502
As we have mentioned, we want to know the threshold Table 3 Cross correlation value of mouth pattern for
value to do the refinement. We have a real-time case in smile activity.
Section 5.2.1 to show the correlation value changes
with smile expression and off-line case on FGNET face Cross Correlation Value
0.9
5.2.1. Real-Time Case
Table 3 shows a sequence of images and their 0.8
Correlation