SSRN Id3909350

Electronic copy available at: https://ssrn.
com/abstract=3909350
FACE MASK DETECTION USING DEEP LEARNING
Mohammed Rizwan s
(Jamia Hamdard University)
INTRODUCTION
The pattern of wearing face mask increase because of the COVID-19. Before
Covid-19, People used to wear faces mask to protect themselves from air pollution. While
some people wear a face mask to protect their face skin. COVID badly affected lifestyles of
the people all over the world. According to researchers, a Face mask can help us to protect
ourselves from the virus. Face mask blocks the bacteria’s outside during respiration and
deals with blocking the transmission. It has been noted that Mask had played a major
role before COVID to stop the spreading of other diseases, such as Tuberculosis, etc.
COVID badly affected the economy of many countries.
The government of all the countries is taking necessary steps to stop the spreading.
AI and Machine Learning can assist with battling Covid-19 in many aspects. Here, we
present the solution for the identification of the Cover face mask. This project model is
prepared using the technology of Computer Vision, Deep Learning, and Machine
Learning. The model is trained to identify whether the individual is covering his/her face
with a mask or not. The model can be used with cameras placed in public places like
markets, malls, parks, metro stations, etc. The model is incorporation between DEEP
learning and traditional AI strategies with OpenCV, tensor stream, and Keras. This
project can be used for real-time detection in cameras.
1. SOFTWARES DESCRIPTION
1.1 SOFTWARE AND TEXT EDITOR
1.1.1 Jupyter NoteBook
Jupyter notebook is open-source IDE that is used to create jupyter document that can
be created and shared with live. It is an open-source web application that allow data scientist
to create and share documents that integrate live code,equation, computational
output,visualization and multimedia resource, along with explanatory text in a single
document.The Jupyter notebook can support various languages that are popular in data
science such as Python,Julia ,Scala,R, etc.
Electronic copy available at: https://ssrn.com/abstract=3909350

Fig.1. Jupyter Notebook
1.1.2 Sublime Text Editor
Sublime Text is a programming text editor. It is flexible, powerful text editor . If a

person is not willing to go to other windows for editing , code checking ,debugging and
deployment that sublime text editor is best option for that person. It can support more than 70
file types, among JavaScript, HTML and CSS. The best part of Sublime Text editor is you can
customize everything like color scheme, text font etc. The heature that makes it more value is
speed.It is highly configurable and extensible text editor.The another reason behind its
amazing performance is that it is tightly coded.
Fig.2. Sublime Text

1.2 FRAMEWORKS AND LIBRARIES
1.2.1 MACHINE LEARNING (ML)
Machine Learning is a field of study by which we predicted outcomes without

explicitly programming with the ability of machines. It is all about creating and implementing
algorithms that machine receive data and use this data to:-
i. Make prediction
ii. Analyze patterns
iii. Give Recommendations
The benefits of machine learning are that machines can discover and learn more effectively by
humans by using complex computational methods.
 Supervised learning: In Supervised Learning, we have the labeled data that we know to
know the target before our objective. Algorithm of dependent variable which is to be
predicted to a given set of independent variables. Ex: Regression ,Classifications
 Unsupervised learning: Algorithm in which we don’t have any target to predict. Here
we use un-labeled data. Ex: Clustering.
 Reinforcement learning: It is similar to supervised learning but here instead of

minimizing the loss we need to maximize the reward. Ex:- Fraud detecting ,client
retention.
1.2.2 COMPUTER VISION
Computer vision (CV) is a type of convolution neural network (which is a field of

deep learning) and is used for image classification, object detection, and neural style
transfer. It is something like scanning over your image with a magnifying glass or with a
filter. It is associated with modeling and replicating the vision of a human being by using
the software and hardware of a computer. Computer Vision leads to the construction of
explicit descriptions of physical objects from their image itself.
Applications:
There are many wide applications of Computer Vision (CV):
1. Robotics :
2. Security Systems
3. Medical Surgery

1.2.3 DEEP LEARNING
Deep Learning is a subfield of machine learning which is used for a large and complex
dataset. Here we use more advance and efficient algorithms to tackle the program effectively.
Deep Learning makes the process of interpreting a large amount of data much faster and easier.
Types of deep Learning:
Generally, deep learning is of 3 types;
1. Artificial Neural Network (ANN)
2. Convolutional Neural Network (CNN)
3. Recurrent Neural Network (RNN)
1.2.4 OpenCV
OpenCV is an open-source library in which CV stands for computer vision. It was

launched in 1999 and was written in C++. This library is quite fast in nature as it is written in
C++. It can be operated with less usage of RAM. There are many modules of the OpenCV
library such as core functionality, Calib 3D, features 2D, high GUI, etc. The first version of
OpenCV was launched in 2006 and the second version of OpenCV 2 was launched in 2009.
Its applications include detecting a specific object, analyzing the video and performing feature
detection, Street view image, and analysis, Robot car navigation.
1.2.5 TENSORFLOW
Tensorflow is a machine learning library or framework which is designed by Google. It

is an open-source framework. It has the potential to train and run deep learning neural networks.
It is designed in Python Language therefore it is considered easy to use framework.
Applications
1. Digit Classification
2. Image Recognization
3. Text Summarization
4. Sentimental Analyses , etc…
1.2.6 KERAS
Keras is a deep learning framework written in a python programming language developed by

Francois Chollet who is an AI researcher at Google. It runs on top of Tensorflow. It is the best

choice for deep learning applications.
Features:
1,User Friendly framework.
3. Supports multiple platforms.
4. Can run in both CPU and GPU.
5. Great community support.
6. High Computational power.
2.2.7. MobileNetV2
MobileNetV2 is a convolutional neural network (CNN) architecture that seeks to

perform well on mobile devices. It is based on an inverted residual structure where the residual
connections are between the bottleneck layers. It has the ability to run deep networks on
personal mobile devices improves user experience, offering anytime, anywhere access, with
additional benefits for security, privacy, and energy consumption. It is developed by a group of
Google Researchers.
Fig.3 . MobileNetV2 Block

1.3 FLOW CHART
3. HARDWARE DESCRIPTION
Webcam:-
It is a digital video commonly built into a computer.It’s main function is ti transmit

picture over the internet.For example; we use it at the time of video call and for live stream
videos. Also when we want to record any video of us, we can use it.It is popularly used with
instant messaging service.Its costs depends according to its quality, starting from 1500 to much
more.It is easily available in market and also can be easily installed. Many laptops and

monitors have built-in or integrated webcams located at the top center of the screen.In many
cases, these built-in cameras are low in quality and have less resolution power.
Fig.4. Webcam
4. Methodology
4.1 Dataset
The dataset which we have used comprises 3550 total pictures out of which 1805 are of
the picture are with facemask and 1786 pictures without a facemask. All the pictures in this
dataset are real pictures. The pictures are taken from different angles so that our model should
identity from them clearly. The dataset for this project is taken from Kaggle one of the best
platforms for data science and machine learning. From every one of the three sources, the
extent of the pictures is equivalent.
We need to divide our dataset into three different sections: preparing dataset, test
dataset, and approval dataset. Preparing dataset is used to train our model and a test dataset is
used to test our model. The training model uses the different images and training itself
accordingly. Training. After the completion of the Training model, we use a test dataset to
check our model whether it's working well or not. Datasets are divided into different ratios.
Here, In this project, the dataset is divided into 4:1, which means 80% of the data is used as
preparing dataset or training dataset and the remaining 20% is used as a test dataset. The reason
to use 80% data as a training dataset is “More the training dataset more accurate will be our
model”. The split proportion of the dataset is 0.8:0.2.

4.2 Training
At the planning time, for each pixel, we consider the default bounding boxes having
different sizes and point of view extents with ground truth boxes finally use the Intersection
over Union (IoU) technique to pick the best organizing with box. IoU evaluates the number of
bits of our expected box that matches with the ground reality. The characteristics range from 0
to 1 and growing potential gains of IoU choose the exactness in the figure; the best worth being
the most vital worth of IoU. The condition and pictorial depiction of IoU are given as follow:
IoU(B1, B2) = (B1 ∩ B2 ) / (B1 𝖴 B2)
Fig 5: Pictorial representation of IoU.
5. Evaluation and Testing

5.1 MATRIX
We took a stab at utilizing three distinctive base models for identifying 'cover' or 'no
veil'. The activity was done to track down the best fit model in our situation. The assessment
interaction comprises of first taking a gander at the grouping report which gives us knowledge
towards exactness, review, and F1 score. The conditions of these three measurements are as per
the following:
Precision = True Positives
Positives + False Positives
Recall = True Positives
Positives + False Negatives
Accuracy = True Positives + True Negatives
Positives + Negatives
Using these three metrics, we can conclude which model is performing most efficiently. The

second part consists of plotting the training loss, validation loss, train accuracy, and validation
accuracy which also proves helpful in choosing a final model. The results of different choices
are shown below.
5.2 LOSS/ACCURACY GRAPH
Fig.6. Training Loss and Accuracy

6. Implementation and Deployment
I implemented this model on different images containing one and more faces. I also
implemented it on live streams video using my laptop camera in different situations
1. Wearing a mask
2. Without wearing the mask
The results in both situations are taken as Screenshots.
Fig. 4. Without Mask
Fig.7. With Mask

Fig.8. Both Without Mask
Fig.9. One With Mask and One Without Mask
6. Limitation
Though technology is bringing revolution in the world .There are many benefits like
automation of cars, Robots etc. On the other hand it has many drawbacks. The first drawback of
this project is its security. This project is working like detective , capturing the data of people.
If anyone get access to this data he/she can use it in wrong way.Secondly, the resolution power
is less. In public places like Malls, Market and the other crowded area this project might fail. It
require storage like hard disk to store the data which make it costly. And the cost of
maintenance is high.

7. Future Scope
Custom alerts can be sent to the person with or without a face mask or the one whose
face is unrecognizable in the admin system. No need to install any hardware as the system can
be connected with your existing surveillance system only. The can implemented easily with
any camera or hardware like surveillance cameras. The system restricts access for those not
wearing the masks and notifies the authorities. You can customize the face mask detection
system based on your business requirements. You can check the analytics based on the system
generated reports.Easy to access and control the movements from any device through face
mask detection applications. Partially occluded faces either with mask or hair or hand, can be
easily detected.
8. Conclusion
To relieve the spread of the COVID-19 pandemic, measures should be taken. To

prepare, approve and test the model, we used the dataset which contains 1805 pictures with
faces covered with the mask and 1789 pictures uncovered without the mask. These pictures
were taken from the Kaggle dataset. The model can also detect the same in real time using live
stream video. The model can be used with cameras placed in public places. The model was
analysis from pictures and live video transfers. To choose a base model, we assessed the
measurements like exactness, accuracy, and review and chose MobileNetV2 engineering with
the best presentation having 100% accuracy and 99% review. It is additionally computationally
proficient utilizing MobileNetV2 which makes it simpler to introduce the model to implanted
frameworks. This model for detection of face mask can be used in numerous spaces like
shopping centers, air terminals, and other weighty traffic spots to screen the general population
and to stay away from the spread of the infection by checking who is adhering to fundamental
guidelines and who isn't.

References
[1] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple
features,” in Proceedings of the 2001 IEEE computer society conference on computer
vision and pattern recognition. CVPR 2001, vol. 1. IEEE, 2001, pp. I–I.
[2] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate
object detection and semantic segmentation,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2014, pp. 580–587.
[3] P. A. Rota, M. S. Oberste, S. S. Monroe, W. A. Nix, R. Campagnoli, J. P. Icenogle, S.
Penaranda, B. Bankamp,K. Maher, M.-h. Chen et al., “Characterization of a novel
coronavirus associated with a severe acute respiratory syndrome,” Science, vol. 300, no.
5624, pp. 1394–1399, 2003.
[4] Y. Liu, A. A. Gayle, A. Wilder-Smith, and J. Rocklöv, “The reproductive number of
covid-19 is higher compared to sars coronavirus,” Journal of travel medicine,2020.
[5] S. Yang, P. Luo, C.-C. Loy, and X. Tang, “Wider face: A face detection benchmark,”
in Proceedings of the IEEE conference on computer vision and pattern recognition,
2016, pp. 5525–5533.
[6] A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors with
online hard example mining,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 761–769.
[7] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd:
Single shot multi-box detector,” in European conference on computer vision. Springer,
2016, pp. 21–37.
[8] Francois Chollet “Xception: Deep Learning with Depthwise Separable Convolutions”
in Proceedings of the IEEE conference on computer vision and pattern
recognition(CVPR), 2017, pp. 1251-1258.
[9] M. Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh
Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” in Proceedings of
the IEEE conference on computer vision and pattern recognition(CVPR), 2018.
[10] Haddad, J.,2020. How I Built A face Mask Detector For COVID-19 Using Pytorch
Lightning.[online] Medium Available at:
https://towardsdatascience.com/how-i-built-a-face-mask-detector-for-covid-19-using
pytorch-lightning-67eb3752fd61

SSRN Id3909350

Uploaded by

Copyright:

Available Formats

SSRN Id3909350

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSRN Id3909350

Uploaded by

Copyright:

Available Formats

Electronic copy available at: https://ssrn.

1.1.1 Jupyter NoteBook

Electronic copy available at: https://ssrn.com/abstract=3909350

1.1.2 Sublime Text Editor

Sublime Text is a programming text editor. It is flexible, powerful text editor . If a

Fig.2. Sublime Text

Electronic copy available at: https://ssrn.com/abstract=3909350

1.2.1 MACHINE LEARNING (ML)

Machine Learning is a field of study by which we predicted outcomes without

ii. Analyze patterns

iii. Give Recommendations

 Reinforcement learning: It is similar to supervised learning but here instead of

1.2.2 COMPUTER VISION

Computer vision (CV) is a type of convolution neural network (which is a field of

There are many wide applications of Computer Vision (CV):

Electronic copy available at: https://ssrn.com/abstract=3909350

Types of deep Learning:

Generally, deep learning is of 3 types;

1. Artificial Neural Network (ANN)

2. Convolutional Neural Network (CNN)

3. Recurrent Neural Network (RNN)

OpenCV is an open-source library in which CV stands for computer vision. It was

Tensorflow is a machine learning library or framework which is designed by Google. It

4. Sentimental Analyses , etc…

Keras is a deep learning framework written in a python programming language developed by

Electronic copy available at: https://ssrn.com/abstract=3909350

1,User Friendly framework.

3. Supports multiple platforms.

4. Can run in both CPU and GPU.

5. Great community support.

6. High Computational power.

MobileNetV2 is a convolutional neural network (CNN) architecture that seeks to

Fig.3 . MobileNetV2 Block

Electronic copy available at: https://ssrn.com/abstract=3909350

It is a digital video commonly built into a computer.It’s main function is ti transmit

Electronic copy available at: https://ssrn.com/abstract=3909350

Electronic copy available at: https://ssrn.com/abstract=3909350

IoU(B1, B2) = (B1 ∩ B2 ) / (B1 𝖴 B2)

Fig 5: Pictorial representation of IoU.

5. Evaluation and Testing

Precision = True Positives

Positives + False Positives

Recall = True Positives

Positives + False Negatives

Accuracy = True Positives + True Negatives

Electronic copy available at: https://ssrn.com/abstract=3909350

5.2 LOSS/ACCURACY GRAPH

Fig.6. Training Loss and Accuracy

Electronic copy available at: https://ssrn.com/abstract=3909350

Fig. 4. Without Mask

Fig.7. With Mask

Electronic copy available at: https://ssrn.com/abstract=3909350

Fig.9. One With Mask and One Without Mask

Electronic copy available at: https://ssrn.com/abstract=3909350

To relieve the spread of the COVID-19 pandemic, measures should be taken. To

Electronic copy available at: https://ssrn.com/abstract=3909350

Electronic copy available at: https://ssrn.com/abstract=3909350

You might also like