TR96 35
TR96 35
TR96 35
http://www.merl.com
Abstract
The appeal of computer games may be enhanced by vision-based user inputs. The high speed
and low cost requirements for near-term, mass-market game applications make system design
challenging. The response time of the vision interface should be less than a video frame time
and the interface should cost less than 50 Dollars U.S. We meet these constraints with algorithms
tailored to particular hardware. We have developed a special detector, called the artificial retina
chip, which allows for fast, on-chip image processing. We describe two algorithms, based on
image moments and orientation histograms, which exploit the capabilities of the chip to provide
interactive response to the player’s hand or body positions at 10 msec frame time and at low-cost.
We show several possible game interactions.
2nd International Conference on Automatic Face and Gesture Recognition, Killington, VT, USA
This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part
without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include
the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of
the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or
republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All
rights reserved.
3 Algorithms
Our goal is to infer useful information about the posi-
tion, size, orientation, or con guration of the player's
body or hands. We seek fast, reliable algorithms for
the inexpensive AR processor module.
We have chosen two algorithms. One uses image
moments to calculate an equivalent rectangle for the
current image. Another uses orientation histograms
to select the body pose from a menu of templates.
The rst exploits the image projection capabilities of
the AR module; the second uses its ability to quickly
calculate x and y derivatives.
3.1 Image Moments
Figure 2: 32x32 input images of user's hand,
and the equivalent rectangle having the same
Image moments [9, 1] provide useful summaries of rst and second order moments as those of the
global image information, and have been applied to image. X-Y position, orientation, and pro-
jected width is measured from the rectangle.
shape analysis or other tasks, often for binary images. (Projected height is also measured, but with
The moments involve sums over all pixels, and so are the hand extending o the picture as shown
robust against small pixel value changes. Within the here, height is redundant with the vertical po-
structure of a computer game, they can also provide sition of the center of mass).
sucient information for the computer to reliably in-
terpret control inputs from the user's body position.
Characteristics of the AR chip allow fast calculation De ne the intermediate variables a, b, and c,
of these moments.
M20
If I (x; y) is the image intensity at position x, y,
then the image moments, up to second order, are:
a =
M00
0 x2c
M00
X X I (x; y)
= M11 =
X X xy I (x; y) b = 2(
M11
M00
x y ) 0 cc
x y x y
M =
X
10
X x I (x; y) M01 =
X X y I (x; y) c=
M02
M00
y2 : 0 c (3)
vertical position
Figure 7 shows the small test set of possible poses
from which the algorithm chose.
Test Image Closest match
image projection
on
n
ti
tio
si
po
ec
oj
al
pr
on
e
ag
ag
di
im
horizontal position
Figure 5: Three image projections determine
the image moments. The horizontal and verti-
cal projections can be performed on the arti -
cial retina detector itself, approximately dou-
bling the throughput.
25
20
15
10
0
0 5 10 15 20 25 30 35
30
25
20
15
10
0
0 5 10 15 20 25 30 35
30
25
20
15
10
0
0 5 10 15 20 25 30 35
eject
Figure 8: Left: sample input images for y-
ing game. Middle: orientation images, from
which orientation histograms are calculated.
Right: corresponding game action.
Figure 7: Test images for Fig. 6. Three peo-
ple made ten poses, with four poses in com- [6] W. T. Freeman and M. Roth. Orientation
mon. Backgrounds were removed by subtrac-
tion. That can leave a ghost of the background histograms for hand gesture recognition. In
inside the gure, but the e ects of such resid- M. Bichsel, editor, Intl. Workshop on automatic
uals were negligible in the recognition perfor- face- and gesture-recognition, Zurich, Switzer-
mance. land, 1995. Dept. of Computer Science, Univer-
sity of Zurich, CH-8057.
puter game applications. One is based on image [7] E. Funatsu, Y. Nitta, M. Miyake, T. Toyoda,
moments, and the other on orientation histograms. K. Hara, H. Yagi, J. Ohta, and K. Kyuma.
These algorithms respond to the user's hand or body SPIE, 2597(283), 1995.
positions, within 10 msec, with hardware that will [8] D. M. Gavrila and L. S. Davis. Towards 3-d
cost several tens of dollars. We are developing com- model-based tracking and recognition of human
puter games with these algorithms and this hard- movement: a multi-view approach. In M. Bich-
ware. sel, editor, Intl. Workshop on automatic face-
and gesture-recognition, pages 272{277, Zurich,
References Switzerland, 1995. Dept. of Computer Science,
University of Zurich, CH-8057.
[1] D. H. Ballard and C. M. Brown, editors. Com-
puter Vision. Prentice Hall, 1982. [9] B. K. P. Horn. Robot vision. MIT Press, 1986.
[2] D. Beymer and T. Poggio. Face recognition from [10] K. Kyuma, E. Lange, J. Ohta, A. Hermanns,
one example view. In Proc. 5th Intl. Conf. on B. Banish, and M. Oita. Nature, 372(197), 1994.
Computer Vision, pages 500{507. IEEE, 1995. [11] R. K. McConnell. Method of and apparatus for
pattern recognition. U. S. Patent No. 4,567,610,
[3] M. Bichsel. International Workshop on Auto- Jan. 1986.
matic Face- and Gesture- Recognition. IEEE
Computer Society, 1995. [12] A. P. Pentland. Smart rooms. Scienti c Amer-
ican, 274(4):68{76, 1996.
[4] M. J. Black and Y. Yacoob. Tracking and rec-
ognizing rigid and non-rigid facial motions us-
ing local parametric models of image motion. In
Proc. 5th Intl. Conf. on Computer Vision, pages
374{381. IEEE, 1995.
[5] J. S. E. Hunter and R. Jain. Posture estima-
tion in reduced-model gesture input systems.
In M. Bichsel, editor, Intl. Workshop on auto-
matic face- and gesture-recognition, pages 290{
295, Zurich, Switzerland, 1995. Dept. of Com-
puter Science, University of Zurich, CH-8057.
image
A. R. detector
calculations
micro-processor
calculations
30
25
20
15
10
orientation 5
image
0
0 5 10 15 20 25 30 35