Face Mask Detection Using Machine Learning and Deep Learning
Abstract — In this research, we put forward an architecture recognizes the face from the image and then determines
that combines recent deep learning algorithms with geometry whether a mask is worn correctly, partially or not at all.
techniques to create robust models that can handle aspects like
detection, tracking and validation. This dissertation focuses on The machine learning algorithms with various pre-trained
creating a model that efficiently utilizes a mix of conventional deep learning architectures have been trained on 66% of our
machine learning and deep learning techniques to categorize dataset and analyzed by the remaining 34% i.e. our test
the facemasks effectively. Our model comprises two aspects in dataset. For each of the algorithms, we obtained Area Under
this hour of need: For dimensionality reduction through Curve (AUC), Recall, F1, Precision and Classification
feature extraction, the initial element is produced with Accuracy (CA) varied with popular models for feature
InceptionV3. The facemask classification procedure is
developed with the Logistic Regression (LR) method. Deep
extraction by visualizing the obtained results in the form of a
learning (DL) is used to efficiently train an architecture using a ROC Curve and Confusion Matrix. A principal challenge
dataset of photos of people's faces with and without and partial confronted by our team in the research was to be able to find
face masks to extract features. The retrieved traits are now and amalgamate the images to the category of partial masks;
passed into various classification algorithms namely Random this category consists of a mixture of mouth-chin, nose-
Forest, Logistic Regression, CNN, Support Vector Machine, mouth and only chin-covered images that help the model to
AdaBoost and K-Nearest Neighbors to appropriately classify produce a better and accurate analysis of whether a mask is
the masks position. Hence, we can project that employing worn correctly or not as shown in Fig. 1. Another major
Transfer Learning (TL) and Deep Learning together can problem was finding images of faces with masks having the
detect a properly or improperly worn face mask with high
accuracy. This system design stops transmitting this fatal virus
right kind of orientation.
by detecting individuals in urban areas who are not wearing
facemasks effectively.
attention in recent months. While personally checking busy places is cumbersome, researchers developed a real-
whether individuals are wearing facemasks in public and time automated model that can determine whether people are
Fig. 2. The architecture for the proposed model for mask detection using pre-trained CNN models.
using facemasks to help maintain social distancing at We utilized NN, LR, KNN, Support Vector Machine
crowded places by utilizing computer vision and Raspberry (SVM), K-Nearest Neighbors, AdaBoost and Random
Pi that can generate accurate feedback through the reports. Forest techniques employed in the image classification
After analysis, we figured that this study uses three datasets. process. For each of the three various feature extraction
The first dataset is the Simulated Masked Face Dataset models, LR has the most remarkable accuracy in our result
(SMFD), the second is Labeled Faces (LF) and the third is observation [5].
Real-World Masked Face Dataset (RWMFD) [2]. In SMFD,
the SVM learning algorithm obtained 99.49 percent In this statistical analysis, conducted for image
accuracy. LFW obtained 100 percent testing accuracy classification [6] performance measures like Recall, F1-
whereas RMFD scored 99.64 percent. Score, Precision and Accuracy have been calculated by
using the following equations (1), (2), (3) and (4).
Another significant contribution in this domain is the
efficient detector, Retina Face Mask Detector which is a face
Precision = TP/ (TP + FP) (1)
mask analyzer that detects whether individuals are wearing
their masks or not [3]. This architecture is indeed a single-
Recall = TP/ (TP + FN) (2)
stage detector that includes a novel ambient attention module
focused on face mask identification and a pyramid network
F-Measure = 2TP/ (2TP + FP + FN) (3)
feature that integrates semantic data with multiple feature
maps-this algorithmic technique eliminates the background
Accuracy = (TP + TN)/ (TP + TN + FP + FN) (4)
attention component artifacts with higher union crossovers
and weak confidences. The Retina Face Mask achieves state-
of-the-art results on facemask datasets that are 2.3 percent AUC Score is generally spoken about after defining the
and 1.5 percent higher than the expected result respectively, ROC curves as shown in Fig. 3, Fig. 4 and Fig. 5. This chart
and mask detection precision that is 11.0 percent and 5.9 visualizes the trade-off between True Positive Rate (TPR)
percent better than the standard result. and False Positive Rate (FPR) where its values are given by
equations (5) and (6):
III. METHODOLOGY True Positive Rates (TPR) = TP/ (TP + FN) (5)
We imported our dataset which was divided into images
with full masks, partial masks (mouth-chin / only chin) and False Positive Rates (FPR) = FP/ (FP + TN) (6)
normal faces without masks as shown in Fig. 1. The
following feature extraction models are then implemented on Here,
our dataset: Inception V3, VGG-16 and VGG-19. The TP = True Positive, TN = True Negative, FP = False
resultant data is sampled with the training and testing dataset Positive, FN = False Negative.
ratios remaining constant at 66:34. There are two
experiments conducted to establish a system for identifying CNN [7] has been mainly built for image classification
appropriately worn face masks. Firstly, a comparison of three tasks. The convolution layers, pooling and activation
different deep features and six machine learning algorithms function layers are layered to a certain depth one over
was carried out. Following that, a comparison of the best another in this architecture. Finally, when performing the
tuples against the average accuracies in the 5-fold cross- classification, an output layer is added to a fully linked layer
validation technique has been computed for all efficacy trials which then calculates the picture’s classification probabilities
(one partition for the training set and one for the test set) [4]. provided as an input to CNN which further uses a SoftMax
layer for mapping them onto specific classes. The
convolution operations are then performed by the findings, we experimented with three distinct feature
convolutional layers on these images by utilizing many extraction models [13]: Inception V3, VGG-16, and VGG-
filters with extremely tiny sizes relative to the provided 19. For all three distinct feature extraction models, we
photos (such as 3 X 3, 5 x 5, etc.). Training is used to learn observed our results through ROC curve and various
the weights associated with these filters. CNN-based network parameters like AUC, Accuracy, F1 score, Precision and
training takes a lengthy time, measured in hours or days. Recall [14].
Additionally, CNN models require a large amount of data • Inception V3 feature model
to be processed to provide informative feature maps. The
introduction of Transfer Learning (TL) [8,9] has effectively The following Fig. 3 demonstrates the ROC curve for
overcome this problem. Initial layer weights that are frozen Inception V3 model [15] and different machine learning
are used in these instances along with their pre-trained classifiers.
models. Now, our dataset is utilized to retrain the final few
layers to extract just the abstract features from those images
as the core traits are consistent across all of the others:
InceptionV3, Visual Geometry Group VGG16 and VGG19.
CNN models include a cutting-edge architecture that
employs this principle.
A. Models Used
We employed popular image classification classifiers
namely Logistic Regression, SVM [12], K-Nearest Fig. 4. ROC curve representing the VGG16 model and
Neighbors, Random Forest and AdaBoost. To get alternative several classifiers.
TABLE II: COMPARISON OF MACHINE LEARNING ALGORITHMS categorization model. They are commonly utilized because
they provide a more accurate picture of a model's
Performance Measures performance than classification accuracy.
AUC Accuracy F1 Precision Recall
NN 96.0% 89.0% 88.9% 89.1% 89.0% CLASSIFIER IN INCEPTION V3 MODEL
SVM 96.6% 88.7% 88.5% 88.6% 88.7%
KNN 87.4% 79.1% 78.0% 77.8% 79.1%
RF 88.0% 79.1% 78.0% 77.8% 79.2%
AdaBoost 75.4% 71.6% 71.8% 72.2% 71.6%
96.1 percent among the two other popular deep learning forest for snow cover mapping,” in Proc. of 2nd
architectures (VGG 16 and VGG 19) and several different International Conference on Computer Vision & Image
predictive analytics approaches that were evaluated. In Processing (pp. 279-287), Springer, Singapore.
assessed pictures, our implementation outperformed other [6] S. Gupta, A. Panwar, S. Goel, A. Mittal, R. Nijhawan and
state-of-the-art algorithms for the task of face mask A. K. Singh, "Classification of Lesions in Retinal Fundus
recognition. We analyzed pictures of three different mask Images for Diabetic Retinopathy Using Transfer
detection scenarios by extracting bottleneck characteristics Learning," International Conference on Information
where we utilized a dataset of 767 pictures where 66% of the Technology (ICIT), 2019, pp. 342-347.
images were utilized for training the model, while the [7] A. Chavda, J. Dsouza, S. Badgujar and A. Damani,
remaining 34% were utilized to assess the model. Since our "Multi-Stage CNN Architecture for Face Mask
approach differentiates a particular image into one of three Detection," 2021 6th International Conference for
groups, it eventually computes distinct categorization Convergence in Technology (I2CT), 2021, pp. 1-8.
probabilities for a particular image according to each class [8] G. J Chowdary, Punn, N.S., Sonbhadra, S.K. and
involved. Image is thus categorized into the class with the Agarwal, S., “Face mask detection using transfer learning
highest probability. This hybrid technique used to identify of InceptionV3,” In International Conference on Big Data
face masks here is very novel. In general, there aren't enough Analytics, 2020, pp. 81-90.
instances to train a deep architecture from the cradle to the [9] M. E. H. Chowdhury et al., "Can AI Help in Screening
grave. Our technique provides a realistic solution to employ Viral and COVID-19 Pneumonia?," IEEE Access, vol. 8,
CNN’s which eliminates their need to generate hand-crafted pp.132665-132676, 2020.
[10] Tripathi, M., “Analysis of Convolutional Neural Network
In reality, this model can be employed in several
based Image Classification Techniques,” Journal of
instances to determine if a person is correctly wearing their
Innovative Image Processing (JIIP), vol. 3, no. 2, pp.100-
masks or not. There are numerous such applications for face 117, 2018.
mask detection for example: in airports, visitors' faces can be
promptly and effectively captured in the system at the entry. [11] A. Cabani, Hammoudi, K., Benhabiles, H. and Melkemi
Hospitals can use a face mask detection system to determine M., “MaskedFace-Net–A dataset of correctly/incorrectly
masked face images in the context of COVID-19,” Smart
whether or not their employees are wearing masks.
Health, pp.1-5, 2021.
Furthermore, for quarantined individuals who are forced to
wear a mask, the system can keep an eye out and identify [12] D. Varshni, K. Thakral, L. Agarwal, R. Nijhawan and A.
whether or not the mask is present. It may also be used in Mittal, "Pneumonia Detection Using CNN based Feature
offices to determine whether employees are adhering to Extraction," 2019 IEEE International Conference on
Electrical, Computer and Communication Technologies
workplace safety regulations. This framework can soon
(ICECCT), 2019, pp.1-7.
achieve great heights by expanding the collection of images
that can transform this CNN network to be more powerful in [13] R. Nijhawan, H. Sharma, H. Sahni and A. Batra, "A Deep
the near future. Learning Hybrid CNN Framework Approach for
Finally, this research opens up new avenues for future Vegetation Cover Mapping Using Deep Features," 2017
13th International Conference on Signal-Image
research. The suggested approach is not restricted to mask
Technology & Internet-Based Systems (SITIS), 2017, pp.
detection and may be implemented into any high-resolution 192-196.
video surveillance system. Secondly, this framework may be
extended to recognize facial landmarks while wearing a [14] S. S. Rawat, A. Bisht and R. Nijhawan, "A Classifier
facemask for biometric purposes. Approach using Deep Learning for Human Activity
Recognition," 2019 Fifth International Conference on
Image Information Processing (ICIIP), 2019, pp. 486-490.
[15] Y. -C. Hsieh, C. -L. Chin, C. -S. Wei, I. -M. Chen, P. -Y.
