Air Canvas Synopsis
Air Canvas Synopsis
Air Canvas Synopsis
SESSION: 2018-22
SUBMITTED BY
NAME COLL. ROLL NO. REG. NO
MANISH RAJ 18CS01 18105109011
AYUSH KUMAR 18CS65 18105109035
DEEPAK KUMAR 18CS64 18105109010
ANURAG KUMAR NIRALA 18CS74 18105109038
This is to certify that the project work entitled “AIR CANVAS” has been developed
by MANISH RAJ, ROLL NO: 18CS01, AYUSH KUMAR, ROLL NO: 18CS65,
DEEPAK KUMAR, ROLL NO: 18CS64, ANURAG KUMAR NIRALA, ROLL
NO: 18CS74 Session: 2018-22, B.Tech (CSE) from Nalanda College of Engineering,
Chandi (Nalanda) under our supervision and guidance, This Project report is
recommended for acceptance for examination and evaluation.
2
DECLARATION
We, MANISH RAJ, AYUSH KUMAR, ANURAG KUMAR
NIRALA & DEEPAK KUMAR student of B.Tech. department of
Computer Science & Engineering, Nalanda College of Engineering,
Chandi (Nalanda) declare that the work presented in this major
project is outcome of our own work is bonafide correct to the best of
our knowledge and no other copy of this project exists anywhere. If
unfortunately any exists then, other person is supposed to make copy
of this project.
MANISH RAJ
AYUSH KUMAR
DEEPAK KUMAR
3
ACKNOWLEDGEMENT
I own a great many thanks to a great many people who helped
supported me during the writing of this project report. My deepest
thanks Mrs. Prof. Priyanka Sinha guide of the project for guiding and
correcting various documents of mine with attention and care. He has
taken pain to go through the project phase I and II and make necessary
correction as and when needed. I extend my gratitude to our Prof. &
Head of department Mrs. Prof. Priyanka Sinha for his cooperation
throughout the semester. I have also received tremendous amount of
help from friends inside and outside of the college. I wanted to thanks
all my friends for constantly showing your helping hands, sharing
your ideas and, most important of all, the friendship. Last but not
least, I wanted to express my gratefulness to my family. My parents
have been constantly encouraging me to pursue this degree. Thank
you all for your support, patience and love
4
CONTENTS
1. INTRODUCTION..........................................................................................6
2. PROBLEM DEFINATION.............................................................................7
3. PROPOSED SOLUTION................................................................................7
4. ALGORITHM AND WORKFLOW……………...........................................8
5. CHALLENGES IDENTIFIED..................................................................8
5.1 Fingertip Detection…………...................................................................08
5.2 Lack of pen up and pen down motion......................................................08
5.3 Controlling the Real time system………………………………....9
6. PROJECT DESIGN……………………………...................................9
6.1 Color Tracking...........................................................................................9
6.2 Contour Detection......................................................................................9
6.3 Drawing the line.........................................................................................9
6.4 Drawing the Points………………………………………………………10
7. SYSTEM METHODOLOGY………………………………………….….10
7.1 Fingertip Detection Model…………………………………………….10
7.2 Techniques of fingertip Recognition Dataset Creation ………….……10
7.3 Fingertip Recognition Model Training…………………….…….…11
5
I. INTRODUCTION
Air canvas helps to draw on a screen just by waiving your finger fitted with a colorful point or a simple
colored cap. We will be using the computer vision techniques of OpenCV to build this project. The
preferred language is python due to its exhaustive libraries and easy to use syntax but understanding
the basics it can be implemented in any OpenCV supported language.
In the era of digital world, traditional art of writing is being replaced by digital art.
Digital art refers to forms of expression and transmission of art form with digital form. Relying
on modern science and technology is the distinctive characteristics of the digital manifestation.
Traditional art refers to the art form which is created before the digital art. From the recipient
to analyse, it can simply be divided into visual art, audio art, audio-visual art and audio-visual
imaginary art, which includes literature, painting, sculpture, architecture, music, dance, drama
and other works of art. Digital art and traditional art are interrelated and interdependent. Social
development is not a people's will, but the needs of human life are the main driving force
anyway. The same situation happens in art. In the present circumstances, digital art and
traditional art are inclusive of the symbiotic state, so we need to systematically understand the
basic knowledge of the form between digital art and traditional art. The traditional way
includes pen and paper, chalk and board method of writing. The essential aim of digital art is
of building hand gesture recognition system to write digitally. Digital art includes many ways
of writing like by using keyboard, touch-screen surface, digital pen, stylus, using electronic
hand gloves, etc. But in this system, we are using hand gesture recognition with the use of
machine learning algorithm by using python programming, which creates natural interaction
between man and machine. With the advancement in technology, the need of development of
natural ‘human – computer interaction (HCI)’ systems to replace traditional systems is
increasing rapidly.
Writing in air has been one of the most fascinating and challenging research areas in field of
image processing and pattern recognition in the recent years. It contributes immensely to the
advancement of an automation process and can improve the interface between man and
machine in numerous applications.
Several research works have been focusing on new techniques and methods that would reduce
the processing time while providing higher recognition accuracy. Object tracking is considered
as an important task within the field of Computer Vision. The invention of faster computers,
availability of inexpensive and good quality video cameras and demands of automated video
analysis has given popularity to object tracking techniques.
Generally, video analysis procedure has three major steps: firstly, detecting of the object,
secondly tracking its movement from frame to frame and lastly analysing the behaviour of that
object. For object tracking, four different issues are taken into account; selection of suitable
object representation, feature selection for tracking, object detection and object tracking. In
real world, Object tracking algorithms are the primarily part of different applications such as:
automatic surveillance, video indexing and vehicle navigation etc.
6
The project takes advantage of this gap and focuses on developing a motion-to-text converter
that can potentially serve as software for intelligent wearable devices for writing from the air.
This project is a reporter of occasional gestures. It will use computer vision to trace the path
of the finger. The generated text can also be used for various purposes, such as sending
messages, emails, etc. It will be a powerful means of communication for the deaf. It is an
effective communication method that reduces mobile and laptop usage by eliminating the need
to write.
Ever thought, waiving your finger into the air can draw on a real canvas. How this air canvas in
Computer Vision Projects works.
In this computer vision project that is a Air canvas which helps to draw on a screen just by
waiving your finger fitted with a colorful point or a simple colored cap. It was OpenCV which
came to the rescue for these computer vision projects. The proposed method provides a natural
human-system interaction in such way that it do not require keypad, stylus, pen or glove etc.
for character input.
1. People hearing impairment: Although we take hearing and listening for granted, they
communicate using sign languages. Most of the world can't understand their feeling, their
emotions without a translator in between.
3. Paper wastage is not scarce news. We waste a lot of paper in scribbling, writing,
drawing, etc.… Some basic facts include - 5 litres of water on average are required to make
one A4 size paper, 93% of writing is from trees, 50% of business waste is paper, 25% landfill
is paper, and the list goes on. Paper wastage is harming the environment by using water and
trees and creates tons of garbage.
Air Writing can quickly solve these issues. It will act as a communication tool for people with
hearing impairment. Their air-written text can be presented using AR or converted to speech.
One can quickly write in the air and continue with your work without much distraction.
Additionally, writing in the air does not require paper. Everything is stored electronically.
7
IV. ALGORITHM OF WORKFLOW
This is the most exciting part of our system. Writing involves a lot of functionalities. So, the
number of gestures used for controlling the system is equal to these number of actions
involved. The basic functionalities we included in our system are
1. Writing Mode - In this state, the system will trace the fingertip coordinates and stores
them.
2. Colour Mode – The user can change the colour of the text among the various available
colours.
3. Start reading the frames and convert the captured frames to HSV colour space.(Easy for
colour detection)
4. 4.Prepare the canvas frame and put the respective ink buttons on it.
5. Adjust the trackbar values for finding the mask of coloured marker.
7. Detect the contours, find the centre coordinates of largest contour and keep storing them
in the array for successive frames .(Arrays for drawing points on canvas)
8. Finally draw the points stored in array on the frames and canvas .
V. CHALLENGES IDENTIFIED
1. Fingertip detection
The existing system only works with your fingers, and there are no highlighters, paints, or
relatives. Identifying and characterizing an object such as a finger from an RGB image without
a depth sensor is a great challenge.
The system uses a single RGB camera to write from above. Since depth sensing is not possible,
up and down pen movements cannot be followed. Therefore, the fingertip's entire trajectory is
traced, and the resulting image would be absurd and not recognized by the model. The
difference between hand written and air written ‘G’ is shown in Figure 1.
Using real-time hand gestures to change the system from one state to another requires a lot of
code care. Also, the user must know many movements to control his plan adequately.
Ever wanted to draw your imagination by just waiving your finger in air. Here we will learn to
build an Air Canvas which can draw anything on it by just capturing the motion of a coloured
marker with camera. Here a coloured object at tip of finger is used as the marker.
We will be using the computer vision techniques of OpenCV to build this project. The
preferred language is python due to its exhaustive libraries and easy to use syntax but
understanding the basics it can be implemented in any OpenCV supported language.
Here Colour Detection and tracking is used in order to achieve the objective. The colour marker
in detected and a mask is produced. It includes the further steps of morphological operations
on the mask produced which are Erosion and Dilation. Erosion reduces the impurities present
in the mask and dilation further restores the eroded main mask.
STEPS IN DETAIL:
1. Colour Tracking of Object at fingertip. First of all, The incoming image from the
webcam is to be converted to the HSV colour space for detecting the colored object at the tip
of finger.
The below code snippet converts the incoming image to the HSV space, which is very suitable
and perfect color space for Color tracking.Now, We will make the Trackbars to arrange the
HSV values to the required range of color of the colored object that we have placed at our
finger.ow, When the trackbars are setup, we will get the realtime value from the trackbars and
create range. This range is a numpy structure which is used to be passed in the function
cv2.inrange( ). This function returns the Mask on the colored object. This Mask is a black and
white image with white pixels at the position of the desired color.
2. Contour Detection of the Mask of Color Object Now, After detecting the Mask in Air
Canvas, Now is the time to locate its center position for drawing the Line. Here, In the below
Snippet of Code, We are performing some morphological operations on the Mask, to make it
free of impurities and to detect contour easily.
3. Drawing the Line using the position of Contour Now Comes the real logic behind this
Computer Vision project, We will form a python deque (A data Structure). The deque will
store the position of the contour on each successive frame and we will use these stored points
to make a line using OpenCV drawing functions. Now, we will use the position of the contour
to make decision, if we want to click on a button or we want to draw on the sheet. We have
arranged some of the buttons on the top of Canvas, if the pointer comes into their area, we will
trigger their method. We have four buttons on the canvas, drawn using OpenCV.
9
➢ Clear : Which clears the screen by emptying the deques.
➢ Red : Changes the marker to red color using color array.
➢ Green : Changes the marker to Green color using color array.
➢ Yellow : Changes the marker to Yellow color using color array. ➢ Blue : Changes the
marker to Blue color using color array.
Also, to avoid drawing when contour is not present, We will Put a else condition which will
capture that instant.
Now we will draw all the points on the positions stored in the deques, with respective colour.
This system needs a dataset for the Fingertip Detection Model. The Fingertip Model's primary
purpose is used to record the motion, i.e., the air character.
Air writing can be merely achieved using a stylus or air pens that have a unique colour [2].
The system, though, makes use of fingertip. We believe people should be able to write in the
air without the pain of carrying a stylus. We have used Deep Learning algorithms to detect
fingertip in every frame, generating a list of coordinates.
Video to Images: In this approach, two-second videos of a person's hand motion were captured
in different environments. These videos were then broken into 30 separate images, as shown
in Figure 3. We collected 2000 images in total. This dataset was labelled manually using
10
LabelImg[13]. The best model trained on this dataset yielded an accuracy of 99%. However,
since the generated 30 images were from the same video and the same environment, the dataset
was monotonous. Hence, the model didn't work well for discrete backgrounds from the ones
in the dataset b. Take Pictures in Distinct Backgrounds: To overcome the drawback caused by
the lack of diversity in the previous method, we created a new dataset. This time, we were
aware that we needed some gestures to control the system. So, we collected the four distinct
hand poses, shown in Figure 3.
The idea was to make the model capable of efficiently recognizing the fingertips of all four
fingers. This would allow the user to control the system using the number of fingers he shows.
He or she could now - promptly write by showing one index finger, convert this writing motion
to e-text by offering two fingers, add space by showing three fingers, hit backspace by showing
five fingers, inter prediction mode by showing four fingers, and then the show 1,2,3 fingers to
select the 1st, 2nd or 3rd prediction respectively. To get out of prediction mode, show five
fingers. This dataset consisted of 1800 images. Using a script, the previously trained model
was made to autolabel this dataset. Then we corrected the mislabelled images and introduced
another model. A 94% accuracy was achieved. Contrary to the former one, this model worked
well in different backgrounds.
Once the dataset was ready and labelled, it is divided into train and dev sets (85%-15%). We
used Single Shot Detector (SSD) and Faster RCNN pre-trained models to train our dataset.
Faster RCNN was much better in terms of accuracy as compared to SSD. Please refer to the
Results Section for more information. SSDs combine two standard object detection modules –
one which proposes regions and the other which classifies them. This speeds up the
performance as objects are detected in a single shot. It is commonly used for real-time object
detections. Faster RCNN uses an output feature map from Fast RCNN to compute region
proposals. They are evaluated by a Region Proposal Network and passed to a Region of Interest
11
pooling layer. The result is finally given to two fully connected layers for classification and
bounding box regression [15]. We tuned the last fully connected layer of Faster RCNN to
recognize the fingertip in the image.
RAM : 8GB
It is used for:
i. web development (server-side),
ii. software development,
iii. mathematics,
iv. system scripting.
12
Why Python?
• Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
• Python has a simple syntax similar to the English language.
• Python has syntax that allows developers to write programs with fewer lines than
some other programming languages.
• Python runs on an interpreter system, meaning that code can be executed as soon
as it is written. This means that prototyping can be very quick.
• Python can be treated in a procedural way, an object-oriented way or a functional
way.
Good to know
• The most recent major version of Python is Python 3, which we shall be using in
this tutorial. However, Python 2, although not being updated with anything other
than security updates, is still quite popular.
• In this tutorial Python will be written in a text editor. It is possible to write Python
in an Integrated Development Environment, such as Thonny, Pycharm, Netbeans
or Eclipse which are particularly useful when managing larger collections of
Python files.
Python Syntax compared to other programming languages
• Python was designed for readability, and has some similarities to the English
language with influence from mathematics.
• Python uses new lines to complete a command, as opposed to other programming
languages which often use semicolons or parentheses.
• Python relies on indentation, using whitespace, to define scope; such as the scope
of loops, functions and classes. Other programming languages often use curly-
brackets for this purpose.
2. NUMPY
What is NumPy?
• NumPy is a Python library used for working with arrays.
• It also has functions for working in domain of linear algebra, fourier transform, and
matrices.
• NumPy was created in 2005 by Travis Oliphant. It is an open source project and
you can use it freely.
• NumPy stands for Numerical Python.
13
• The array object in NumPy is called ndarray, it provides a lot of supporting
functions that make working with ndarray very easy.
• Arrays are very frequently used in data science, where speed and resources are very
important.
• Data Science: is a branch of computer science where we study how to store, use
and analyze data for deriving information from it.
3. OPENCV
OpenCV is the huge open-source library for the computer vision, machine learning, and
image processing and now it plays a major role in real-time operation which is very
important in today’s systems. By using it, one can process images and videos to identify
objects, faces, or even handwriting of a human. When it integrated with various libraries,
such as NumPy, python is capable of processing the OpenCV array structure for analysis.
To Identify image pattern and its various features we use vector space and perform
mathematical operations on these features.
The first OpenCV version was 1.0. OpenCV is released under a BSD license and hence
it’s free for both academic and commercial use. It has C++, C, Python and Java interfaces
and supports Windows, Linux, Mac OS, iOS and Android. When OpenCV was designed
the main focus was real-time applications for computational efficiency. All things are
written in optimized C/C++ to take advantage of multicore processing.
Applications of OpenCV: There are lots of applications which are solved using OpenCV,
some of them are listed below :-
• face recognition
• Automated inspection and surveillance
• number of people – count (foot traffic in a mall, etc)
• Vehicle counting on highways along with their speeds
• Interactive art installations
• Anomaly (defect) detection in the manufacturing process (the odd defective
products)
• Street view image stitching
• Video/image search and retrieval
14
• Robot and driver-less car navigation and control
• object recognition
• Medical image analysis
• Movies – 3D structure from motion
• TV Channels advertisement recognition
OpenCV Functionality
First, you have to download and install the Visual Studio. For that, you can refer to
Downloading and Installing Visual Studio 2019. Don’t forget to select the .NET core workload
during the installation of VS 2019. If you forget then you have to modify the installation.
You can see a number of tool windows when you will open the Visual Studio and start writing
your first program as follows:
15
Figure 6
Output Window: Here the Visual Studio shows the outputs, compiler warnings, error
messages and debugging information.
Solution Explorer: It shows the files on which the user is currently working.
Properties: It will give additional information and context about the selected parts of the
current project.
A user can also add windows as per requirement by choosing them from View menu. In Visual
Studio the tool windows are customizable as a user can add more windows, remove the existing
open one or can move windows around to best suit.
Various Menus in Visual Studio: A user can find a lot of menus on the top screen of Visual
Studio as shown below
Figure 7
Create, Open and save projects commands are contained by File menu.
Searching, Modifying, Refactoring code commands are contained by the Edit menu.
View Menu is used to open the additional tool windows in Visual Studio.
Project menu is used to add some files and dependencies in the project.
To change the settings, add functionality to Visual Studio via extensions, and access various
Visual Studio tools can be used by using Tools menu.
The below menu is known as the toolbar which provide the quick access to the most frequently
used commands. You can add and remove the commands by going to View → Customize
16
Figure 8
Note:
Support for different programming languages in Visual Studio is added by using a special
VSPackage which is known as Language Service.
When you will install the Visual Studio then the functionality which is coded as VSPackage
will be available as Service.
Visual Studio IDE provides the three different types of services known as SVsSolution,
SVsUIShell, and SVsShell.
SVsSolution service is used to provide the functionality to enumerate solutions and projects in
Visual Studio.
SVsUIShell service is used to provide User Interface functionality like toolbars, tabs etc.
17
X. SCREENSHOTS
Figure 9
Figure 10
18
XI. CODING
import numpy as np
import cv2
from collections import deque
19
paintWindow = np.zeros((471, 636, 3)) + 255
cv2.namedWindow('Paint', cv2.WINDOW_AUTOSIZE)
# Keep looping
while True:
20
(255, 255, 255), 2, cv2.LINE_AA)
# Clear Button
if 40 <= center[0] <= 140:
bpoints = [deque(maxlen = 512)]
gpoints = [deque(maxlen = 512)]
rpoints = [deque(maxlen = 512)]
ypoints = [deque(maxlen = 512)]
blue_index = 0
green_index = 0
red_index = 0
yellow_index = 0
paintWindow[67:, :, :] = 255
elif 160 <= center[0] <= 255:
21
colorIndex = 0 # Blue
elif 275 <= center[0] <= 370:
colorIndex = 1 # Green
elif 390 <= center[0] <= 485:
colorIndex = 2 # Red
elif 505 <= center[0] <= 600:
colorIndex = 3 # Yellow
else :
if colorIndex == 0:
bpoints[blue_index].appendleft(center)
elif colorIndex == 1:
gpoints[green_index].appendleft(center)
elif colorIndex == 2:
rpoints[red_index].appendleft(center)
elif colorIndex == 3:
ypoints[yellow_index].appendleft(center)
for j in range(len(points[i])):
22
XII. CONCLUSION
The system has the potential to challenge traditional writing methods. It eradicates the need to
carry a mobile phone in hand to jot down notes, providing a simple on the-go way to do the
same. It will also serve a great purpose in helping especially abled people communicate easily.
Even senior citizens or people who find it difficult to use keyboards will able to use system
effortlessly. Extending the functionality, system can also be used to control IoT devices
shortly. Drawing in the air can also be made possible. The system will be an excellent software
for smart wearables using which people could better interact with the digital world. Augmented
Reality can make text come alive. There are some limitations of the system which can be
improved in the future. Firstly, using a handwriting recognizer in place of a character
recognizer will allow the user to write word by word, making writing faster. Secondly, hand-
gestures with a pause can be used to control the real-time system as done by [1] instead of
using the number of fingertips. Thirdly, our system sometimes recognizes fingertips in the
background and changes their state. Air-writing systems should only obey their master's
control gestures and should not be misled by people around. Also, we used the EMNIST
dataset, which is not a proper air-character dataset. Upcoming object detection algorithms such
as YOLO v3 can improve fingertip recognition accuracy and speed. In the future, advances in
Artificial Intelligence will enhance the efficiency of air-writing.
23