Project Report
Project Report
Project Report
A PROJECT REPORT
Submitted by
ABDUL ASIM A.
AFSHAN S.
ANAND R.
BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY
BONAFIDE CERTIFICATE
supervision.
SIGNATURE SIGNATURE
Professor
ii
ANNA UNIVERSITY : CHENNAI 600 025
The viva-voce examination of the following students who have submitted the
project work “HUMAN COMPUTER INTERFACE BASED ON FACE TRACKING
FOR PHYSICALLY CHALLENGED USERS” is held on _____________
iii
ACKNOWLEDGEMENT
We express our thanks to our project coordinator Ms. R. REVATHY, Senior Lecturer,
Department of Information Technology, for her valuable suggestions at every stage of
our project.
We record our sincere thanks to our guide Dr. ANGELINA GEETHA, Professor,
Department of Computer Science, for being instrumental in the completion of our
project with her exemplary guidance.
We thank all the staff members of our department for their valuable support and
assistance at various stages of our project development.
iv
TABLE OF CONTENTS
ABSTRACT vii
LIST OF TABLE viii
LIST OF FIGURES ix
LIST OF ABBREVATIONS x
1. INTRODUCTION
2. PROBLEM DEFINITION 8
3. DEVELOPMENT PROCESS 9
v
3.3 Design 12
3.3.1 System Architecture 12
3.3.2 Detailed Design 13
3.3.2.1 User Interface 14
3.3.2.2 Module Description 14
3.4 Implementation 19
3.5 Testing 23
5. CONCLUSION 26
APPENDIX A – SCREENSHOTS 27
REFERENCES 35
vi
ABSTRACT
computer access are available for disabled people, these systems are expensive and
require sophisticated hardware support. In this context, this system focuses on helping
Interface for such users which is inexpensive and easy to implement. Human Computer
interactive computing systems for human use and with the study of major phenomena
surrounding them. We propose an interface for people with severe disabilities based on
face tracking. Body features like the eyes and the lips may also be used for implementing
a human computer interface but with some limitations. In eye tracking, the motion of the
pupil is hard to track with a web camera which would be the primary mode of input in the
proposed system. For a physically challenged user, moving the face itself demands
greater effort and hence finer intricacies eyeball and lip movement cannot be considered.
The system depends on a web camera for input and hence would be affordable by the
target users. User friendliness is enhanced as the system is devoid of any sophisticated
hardware requirement.
vii
LIST OF TABLES
viii
LIST OF FIGURES
ix
LIST OF ABBREVIATIONS
1. INTRODUCTION
x
1.1 Feature Detection
Feature detection is a process by which specialized nerve cells in the brain respond
to specific features of a visual stimulus, such as lines, edges, angle, or movement. The
nerve cells fire selectively in response to stimuli that have specific characteristics.
Feature detection was discovered by David Hubel and Torsten Wiesel of Harvard
University.
In computer vision and image processing the concept of feature detection refers to
methods that aim at computing abstractions of image information and making local
decisions at every image point whether there is an image feature of a given type at that
point or not. The resulting features will be subsets of the image domain, often in the form
of isolated points, continuous curves or connected regions.
Once features have been detected, a local image patch around the feature can be
extracted. This extraction may involve quite considerable amounts of image processing.
The result is known as a feature descriptor or feature vector.
Types of tracking:
Eye Tracking:
xi
Eye tracking is the process of measuring either the point of gaze or the
motion of an eye relative to the head. An eye tracker is a device for measuring eye
positions and eye movements. There are a number of methods for measuring eye
movements. The most popular variant uses video images from which the eye position is
extracted. Other methods use search coils or are based on the electro-oculogram. Two
general types of eye tracking techniques are used: Bright Pupil and Dark Pupil. Their
difference is based on the location of the illumination source with respect to the optics. If
the illumination is coaxial with the optical path, then the eye acts as a retro-reflector as
the light reflects off the retina creating a bright pupil effect similar to red eye. If the
illumination source is offset from the optical path, then the pupil appears dark because
the retro-reflection from the retina is directed away from the camera.
Head Tracking:
Head tracking technology consists of a device transmitting a signal from atop the
computer monitor and tracking a reflector placed on the user's head or eyeglasses. A
mouse alternative as this allows the person to control the mouse cursor by moving his/her
head. Once calibrated, the movement of the user's head relates to what direction the
onscreen cursor will travel. An example of a head tracking system is given in Figure 1.1.
xii
Face detection is a computer technology that determines the locations and sizes of
human faces in arbitrary (digital) images. It detects facial features and ignores anything
else, such as buildings, trees and bodies.
Face detection can be regarded as a more general case of face localization; In face
localization, the task is to find the locations and sizes of a known number of faces
(usually one). In face detection, one does not have this additional information.
Face detection is used in biometrics, often as a part of (or together with) a facial
recognition system. It is also used in video surveillance, human computer interface and
image database management. Some recent digital cameras use face detection for
autofocus. Also, face detection is useful for selecting regions of interest in photo
slideshows that use a pan-and-scale effect.
xiii
face images or implicitly by neural networks or other mechanisms. The parameters for
these models are adjusted either automatically from example images or by hand.
This algorithm is a statistical method for three dimensional object detection. The
statistics of both object appearance and non-object is represented using histograms. Each
histogram represents the joint statistics of a subset of wavelet coefficients and their
position on the object. This approach uses many such histograms to represent a wide
variety of visual attributes. The algorithm is the first of its kind to reliably detect human
faces with out-of-plane rotation.
CAMSHIFT Algorithm
xiv
Persons with severe motion impairment like biplegics, quadriplegics etc. face
difficulty in accessing computer-based systems since they cannot use conventional
computer access devices like mouse or keyboards. Alternate computer interfaces based
on tracking of body features needs to be developed for these users. The challenge lies in
designing a system which would serve as a general interface between computers and
physically challenged users.
Pointing devices like the mouse and trackball enables users to control a pointer and
interact with a graphical user interface. The current human-computer interaction mode,
based primarily on the message board and the mouse, has seen little change since the
advent of modern computing. Currently computers come with cameras as standard
equipment. Hence it is desirable to employ them in designing next-generation human
computer interaction devices. The feasibility of interfaces based on speech driven input
has also been extensively investigated.
In the works of James Gips, Margrit Betke and Peter Fleming (2000), preliminary
investigations have been carried out for the design of a human computer interface for
xv
quadriplegic and non-verbal users. The system has been broken down into two main
components. The first component is the Vision Computer which receives real-time input
from a camera mounted on the monitor. The second component is the User’s Computer
which runs a special driver program in the background to translate the user’s movement
from the input device into mouse movements on the screen.
A camera mouse system was developed by James Gips, Margrit Betke and Peter
Fleming (2002). The system makes use of body features like the tip of the user’s nose or
finger or face to track the position of the mouse. Various body features are examined for
tracking reliability and user convenience. The visual tracking algorithm used in this
system is based on cropping an online template of the tracked feature from the current
image frame and testing where this template correlates in the subsequent frame. The
location of the highest correlation is interpreted as the new location of the feature in the
subsequent frame. Our system takes into consideration, part of the modules of the
algorithm for regular updating of the image frames.
A face detection algorithm based on skin color has been proposed by Sanjay Singh,
D.S. Chauhan, Mayank Vatsa and Richa Singh (2003). The authors have discussed
various algorithms based on skin color. Three main color spaces of RGB, YCbCr and
HIS have been combined to get a new skin color based face detection algorithm which
achieves higher accuracy. Our system involves face localization discussed in this
publication.
xvi
In the works of Rajesh Kumar and Anupam Kumar (2008), alternate input systems
to replace the traditional mouse and keyboard are discussed. The authors have developed
an input system which uses the head and eyes to track the movements of the user. The
algorithm is based upon image matching using correlation coefficients. The system
comprises of an image tracer module and cursor position is determined by calculating
correlation coefficient of tracing window in image space.
Our system makes use of the Haar face detection algorithm to recognize and track
faces from real time video input. The main tasks involved are webcam capture, face
detection and translation of facial movements into mouse movements. A web camera is a
low-resolution capture device. The Haar face detection algorithm processes the video
feed using a large number of evaluations called classifiers to localize faces. This helps in
achieving a high degree of accuracy.
2. PROBLEM DEFINITION
xvii
People with severe disabilities resulting from birth or accidents or from
degenerative diseases and bed ridden patients have been excluded from access to
computers and even lack proper means of communication with fellow human beings.
Information is presented in an inaccessible form to them. They are unable to speak and
have very little or no voluntary muscle control. In most cases, these people are able to
move only their heads. Their level of mental functioning might not be known because of
their inability to communicate. People with severe physical disabilities often are isolated,
spending hours in bed or in a wheelchair at home or in an institutional setting.
Computer and communication technology can make all the difference in the world
for people with profound physical disabilities. Our approach is to develop a computer
interface for the disabled using facial tracking. The challenge is to develop a low cost
system devoid of any sophisticated hardware for input. The system should be free from
any special hardware to track the desired feature as this may cause inconvenience to the
user.
The facial movements of the user are captured using a webcam and translated into
mouse pointer movements after preprocessing and applying face detection algorithm.
Thus by moving the face, the user would be able to control the mouse. The interface
contains options for raising an alarm, summoning a nurse and playing audio and video for
entertainment. An on-screen message board has also been provided to enable the user to
communicate effectively.
3. DEVELOPMENT PROCESS
xviii
A software development process is a structure imposed on the development of a
software product. The activities concerned with the development of a software are
collectively known as Software Development Life Cycle (SDLC). SDLC is any logical
process used by a systems analyst to develop an information system, including
requirements, validation, training, and user ownership. An SDLC should result in a high
quality system that meets or exceeds customer expectations, reaches completion within
time and cost estimates, works effectively and efficiently in the current and planned.
The input for the human computer interface will be obtained from a web camera.
Since the interface would solely depend on the camera, care should be taken in choosing
the computer camera. A web camera is chosen over other mediums of video capture for
two reasons. First, a web camera is less expensive compared to other visual input devices
and this makes the system affordable to every individual. Also the web camera does not
require any specialized drivers or software support and this makes it easy for the
developer to access real-time video feeds.
The facial movements of the user are captured through the camera in Visual C++.
The live video stream is fed to the face detection algorithm. The detected face is given as
input to the tracker module which translates the facial movements into mouse pointer
movements. This can be then be used to access the user interface.
3.2.1 Hardware
The minimum hardware requirements for this project are listed in Table 1.
xx
Hardware Requirement
3.2.2 Software
The minimum software requirements for this project are listed in Table 2.
Software Requirement
Operating System Windows 2000/XP
Runtime Package Microsoft Visual C++, Intel OpenCV
Webcam Drivers Logitech/Microsoft SDK
3.3 DESIGN
xxi
3.3.1 System Architecture
The architecture of the system is represented in Figure 3.1. The system receives
real time input from the user via a web camera. The vide o stream is accessed via the
webcam capture module. The vendor-supplied webcam software cannot be used for
interfacing the webcam and the face detection module.
The input from the camera is given to the face detection module. The core of the
face detection module contains the algorithm which works on localizing the facial
segments from the rest of the image. The algorithm is adapted to detect faces from
streaming video feeds.
After the face has been detected in the video stream, the movements of the face are
translated into mouse cursor movements on the screen and updated accordingly in real-
xxii
time. The position of the face is converted into onscreen coordinates and this is mapped
into mouse pointer coordinates in the tracker module. Hence, when the user moves his
face, the mouse cursor is moved correspondingly. This tracking module is interfaced with
the Graphical User Interface (GUI). Using the mouse movements, the user can interact
with the application interface.
Our system provides an efficient way for bed ridden people to interact with a
computer and also provides an efficient communication system. The main tasks to be
accomplished in the development of the proposed system are as follows:
• Accessing the video stream from the video camera in real time
• Translating the facial motion into an input format which can be used to
manipulate the user interface
The system has been developed in Microsoft Visual C++. The system can be
executed by running the project executable file. The web camera has to be setup and
initialized before executing the system. The system will automatically detect the web
camera provided there is only one active camera at execution time.
The web camera must be fixed and focused on the facial region of the target user.
Care should be taken to align the camera in this way. The system tracks the signals
captured by the web camera, analyses and detects the face region. As the video stream
xxiii
progresses, by applying the algorithm, the facial movement is detected. Once face
detection has been established, control passes to the mouse pointer and the user is able to
move the mouse pointer by moving his/her face.
At the center of the interface is a display window which shows the real time video
stream from the web camera. It displays the detected face which is updated constantly in
real-time. The interface has buttons to invoke various functions. The user is able to raise
an alarm, summon a nurse or play audio and video for entertainment purpose. An
onscreen message board can also be invoked for communication purposes. The invoked
function can be stopped using the stop button and the application can be closed using the
exit button provided in the interface.
The basic flow of the system is represented in Figure 3.2. The Human computer
interface for physically challenged users is made possible by the video feed from the web
camera. The modules of the proposed system are as follows:
2. Face Detector
3. Tracker module
4. Application Interface
xxiv
Figure 3.2: System Flow diagram
The input for the system is captured using the web camera. Lighting conditions
should also be favourable. The bundled software supplied with the camera can be used to
capture images and video. But this cannot be interfaced with the application to be
developed. Thus we capture the video stream from the camera in Visual C++ using
Microsoft DirectShow. Microsoft DirectShow is a part of the Microsoft Direct X SDK. It
is a set of low-level application programming interfaces for creating games and other
high performance multimedia applications. DirectShow automatically detects and uses
audio and video acceleration whenever available. The captured video stream is displayed
at the center of the user interface. The video stream is given as input to the face detection
module. The code for webcam capture is given in Figure 3.3.
xxv
// Capture from the camera
capture = cvCaptureFromCAM(-1);
// Capture the frame and load it in IplImage
frame = cvRetrieveFrame( capture );
// Allocate framecopy as the same size of the frame
if( !frame_copy )
frame_copy = cvCreateImage( cvSize(frame->width,frame->height),
IPL_DEPTH_8U, frame->nChannels );
The facial movements of the user are captured from the web camera and given to
the face detector module. The algorithm used in our system is the Multi-view Face
Detection and Recognition Algorithm using Haar-like Features. Haar-like features are
digital image features used in object recognition. They owe their name to their intuitive
similarity with Haar wavelets. The feature set considers rectangular regions of the image
and sums up the pixels in this region. This sum is used to categorize images. We could
thus categorize all images whose Haar-like feature in this rectangular region to be in a
certain range of values as one category and those falling out of this range in another. This
might roughly divide the set of images into ones having a lot of faces and the ones not
having faces. We could thus categorize all images whose Haar-like feature in this
rectangular region to be in a certain range of values as one category and those falling out
of this range in another. This might roughly divide the set of images into ones having a
lot of faces. Once the face has been detected, a coloured box is drawn around the face to
localize it. The algorithm constantly localizes the face in the dynamic video stream.
xxvi
const char* cascade_name = "haarcascade_frontalface_alt.xml";
// Create a new image based on the input image
IplImage* temp = cvCreateImage( cvSize(img->width/scale,img->height/scale), 8, 3 );
// Detect the objects
CvSeq*faces=cvHaarDetectObjects(img,cascade,storage,1.1,2,
CV_HAAR_DO_CANNY_PRUNING, cvSize(40, 40) );
Tracker module:
The face detector module draws a square around the localized face. The
coordinates of the square are passed as coordinates to the SetCursor function. This
enables the mouse pointer to move when the user moves his/her face. The coordinates are
multiplied by a scaling factor in order to enhance mouse movement. Mouse clicking
function is implemented using a time delay. When the mouse pointer hovers over a
button for a specified time, the button gets clicked. The code snippet for mouse control is
given in Figure 3.5.
//face coordinates
pt1.x = r->x*scale;
pt2.x = (r->x+r->width)*scale;
pt1.y = r->y*scale;
pt2.y = (r->y+r->height)*scale;
pt3.x=(pt1.x)*7;
pt3.y=(pt1.y)*7;
SetCursorPos(pt3.x,pt3.y);
xxvii
//mouse clicking
mouse_event(MOUSEEVENTF_LEFTDOWN,0,0,0,GetMessageExtraInfo());
mouse_event(MOUSEEVENTF_LEFTUP,0,0,0,GetMessageExtraInfo());
Application Interface:
Message board - enables the user to display small messages to express their
xxviii
Figure 3.7 : Message board
3.4 IMPLEMENTATION
xxix
Microsoft Visual C++ 2005 provides a powerful and flexible development
environment for creating Microsoft Windows–based and Microsoft .NET–based
applications. It can be used as an integrated development system, or as a set of individual
tools. Visual C++ is comprised of these components:
The Visual C++ 2005 compiler tools - The compiler has new features supporting
developers that target virtual machine platforms like the Common Language Runtime
(CLR) . There are now compilers to target x64 and Itanium. The compiler continues to
support targeting x86 machines directly, and optimizes performance for both platforms.
The Visual C++ 2005 Libraries - This includes the industry-standard Active
Template Library (ATL) , the MFC libraries, and standard libraries such as the Standard
C++ Library, and the C RunTime Library, which has been extended to provide security
enhanced alternatives to functions known to pose security issues. A new library, the C++
Support Library, is designed to simplify programs that target the CLR.
The Visual C++ 2005 Development Environment - Although the C++ compiler
tools and libraries can be used from the command-line, the development environment
provides powerful support for project management and configuration (including better
support for large projects), source code editing, source code browsing, and debugging
tools. This environment also supports IntelliSense, which makes informed, context-
sensitive suggestions as code is being authored.
xxx
The Intel Open Source Computer Vision (OpenCV) library is a computer vision
library originally developed by Intel. It is free for commercial and research use under a
BSD license. The library is cross-platform, and runs on Windows, Mac OS X, Linux,
PSP, VCRT (Real-Time OS on Smart camera) and other embedded devices. It focuses
mainly on real-time image processing, as such, if it finds Intel's Integrated Performance
Primitives on the system, it will use these commercial optimized routines to accelerate
itself. Officially launched in 1999, the OpenCV project was initially an Intel Research
initiative to advance CPU-intensive applications, part of a series of projects including
real-time ray tracing and 3D display walls. The library is mainly written in C, which
makes it portable to some specific platforms such as Digital signal processor. But
wrappers for languages such as C# and Python have been developed to encourage
adoption by a wider audience. Our system makes use of some functions present in this
library in the form of DLLs.
Microsoft DirectShow:
xxxi
Working of the Algorithm:
The algorithm used in our system is the Multi-view Face Detection and
Recognition Algorithm using Haar-like Features. This algorithm is designed for still
images. It has been modified to detect faces from streaming video feeds.
Rectangular Scaling
The overall algorithm is depicted in Figure 3.8. The detection technique is based
on the idea of a wavelet template that defines the shape of an object in terms of a subset
of the wavelet coefficients of the image.
The input image is scanned across location and scale using a scaling factor of 1.1.
At each location and independent decision is made regarding the location of the face.
xxxii
This leads to a large number of classifier evaluations. Each classifier is a simple function
of rectangular sums followed by a threshold.
In each round of boosting, one feature is selected, that with the lowest weighted
error. In subsequent rounds incorrectly labeled examples are given a higher weight while
correctly labeled examples are given a lower weight. In order to reduce the false positive
rate while preserving efficiency, classification is divided into a cascade of classifiers. The
input is passed from one classifier to the next as long as each classifier classifies the
window as a face
An input window is evaluated on the first classifier of the cascade and if that
classifier returns false then computation on that window ends and detector returns false.
If the classifier returns true then the window is passed onto the next classifier in the
cascade. The next classifier evaluates the window in the same way. The more a window
looks like a face, more classifiers are evaluated on it and longer it takes to classify the
window.
3.5 TESTING
Testing is the process of evaluating the correctness, the quality and the
completeness of the system developed. Our system was tested across a variety of
applicants. It was found that the system was able to detect faces successfully in all cases.
The application is also able to pick out faces from considerably large distances. The user
requires some training in order to move the mouse efficiently. Face detection is found to
be efficient even with a normal web camera and under ordinary lighting conditions.
However, care should be taken to align the web camera with the facial region of the user
for optimum face detection.
xxxiii
4. APPLICATIONS AND FUTURE ENHANCEMENT
Our system is mainly targeted towards physically disabled people who are
quadriplegic and non-verbal and bed ridden patients. But this human computer interface
has other applications as well. It can be used as an alternative to the traditional mouse and
xxxiv
keyboard. It can be used to control the entire computer, browse the internet, prepare
documents etc. As the system is relatively inexpensive, it can be installed in hospitals as
a communication system for patients. The system may also be used as a hands-free
navigation device to access a computer. This facilitates multitasking. For example, a
doctor while performing a surgery can make use of this system to issue commands to a
computer.
The system can be enhanced with high resolution cameras like infra red cameras to
improve face detection. It can be interfaced with external mobile devices to enhance the
communication part. The system can be enhanced for use in biometric security systems.
5. CONCLUSION
The objective of this project is to provide an automated system which will capture
the facial movements of the target user and correlate it with mouse pointer movements on
the screen. The developed interface will enable quadriplegic and non-verbal users to
access a computer.
xxxv
A system has been developed for use by disabled people and bedridden patients. A
webcam interface captures the facial movements of the user. Face detection algorithm is
implemented and integrated with mouse movements on the screen. The system has been
integrated with four functions to aid physically challenged people. An emergency button
is provided for raising an alarm. Clicking on the audio button plays audio files for
entertainment. The video button is used to play videos for entertainment. An onscreen
message board has been provided for communication purposes. It helps the users to
display short messages to express their needs . The future focus is on enabling the system
to incorporate certain hardware based interfaces such as moving a robot.
Appendix A – Screenshots
MAIN INTERFACE
xxxvi
FACE TRACKING 1
xxxvii
FACE TRACKING 2
xxxviii
FACE TRACKING 3
xxxix
FACE TRACKING 4
xl
FACE TRACKING FOR A BED RIDDEN USER
xli
PLAYING VIDEO
xlii
MESSAGE BOARD
xliii
REFERENCES
xliv
1. Gary R. Bradski (1998), “Computer Vision Face Tracking for Use in a Perceptual
User Interface”, Intel Technical Journal Q2 ’98, Microcomputer Research Lab,
Santa Clara, CA, Intel Corporation.
2. James Gips, Margrit Betke and Peter Fleming (), “The Camera Mouse: Preliminary
Investigation of Automated Visual Tracking For Computer Access”, Computer
Science Department, Boston College, Chestnut Hill, MA 02467.
3. James Gips, Margrit Betke and Peter Fleming (2002), ”The Camera Mouse: Visual
Tracking of Body Features to Provide Computer Access for People With Severe
Disabilities”, IEEE Transactions on Neural Systems and Rehabilitation
Engineering, Vol. 10, No. 1.
4. Rajesh Kumar, Anupam Kumar (2008), “Black Pearl: An Alternative for Mouse
and Keyboard”, ICGST-GVIP, ISSN 1687-398X, Volume (8), Issue (III).
xlv