1 Introduction

Before pandemic, people used a mask due of air pollution reason and currently, scientists have proved that it is one of best way to prevent COVID-19 transmission. According to World Health Organization (WHO) report statement on a weekly epidemiological update on COVID-19 on 18th January 2022, edition 75: “the number of new COVID-19 cases increased in the past week (10-16 January 2022), while the number of new deaths remained similar to that of the previous week. Across the six WHO regions, over 18 million new cases were reported this week, a 20% increase, as compared to the previous week. Over 45 000 new deaths were also reported. As of 16 January, over 323 million confirmed cases and over 5.5 million deaths have been reported worldwide. Despite a slowdown of the increase in case incidence at the global level, all regions reported an increase in the incidence of weekly cases with the exception of the African Region, which reported a 27% decrease. The South-East Asia region reported the largest increase in new cases last week (145%), followed by the Eastern Mediterranean Region (68%). New weekly deaths increased in the South-East Asia Region (12%) and Region of the Americas (7%) while remaining approximately the same as the previous week in the other regions” [28]. These statistics demonstrate the rapidity contamination of COVID-19, so it motivates us to orient our research in face mask detection.

Due of recent apparition of COVID-19 pandemic, different studies were done to detect human face mask and most of them focus in deep learning using Convolutional Neural Network (CNN) or MobileNet which needs a lot of parameters and composed by the succession of convolution and spooling layers. With this concept, the solution architecture is complex, some pertinent information may be lost during feature extraction phase and the computation time is considerable.

In this paper, we will propose an efficient approach to detect the presence of face mask in human visage based on Pulse Couple Neural Network (PCNN) and Fully Connected Neural Network (FCNN). To accomplish this, we process the input image by detecting the eyes and once we have eyes position, we are able to identify the region of interest (ROI) of the face. This ROI will be the input of PCNN and a statistic module handles the binary output image by calculating different parameters such as gray level uniformity (GU), gray level contrast (GC), entropy (E) and cross-entropy (CE) to help decision module for the verdict in case of classic image overwise the Fully Connected Neural Network (FCNN) decides for a complex case. To validate our approach, we applied it into different datasets used by the recent related works cited in the next paragraph and we have an improvement compared with them.

The main contributions of this work are cited below:

  1. 1.

    This paper may help to reduce the fast spread of COVID-19.

  2. 2.

    The solution doesn’t need a prior training for a simple image.

  3. 3.

    The method is simple with system architecture non-complex because there are just a few parameters required and for this reason, it allows us to have an optimal computation time. Such kind of solution may be supported by embedded systems.

  4. 4.

    With this approach, different type of facemask can be detected such us a standard facemask or medical facemask.

  5. 5.

    In the health crisis, police and gendarmes therefore had a fundamental role in ensuring compliance with the rules applicable in the public space. Facemask detection is part of this task and can be automated using this research.

The rest of the paper is organized as follows: paragraph 2 summarizes the recent works related to our proposed approach. The paragraph 3 described the proposed method followed by performance comparison and discussion in paragraph 4. Finally, the paragraph 5 concludes the paper. To ensure a good understanding of this article, Table 1 presents the list of abbreviations and definitions.

Table 1 Abbreviations table

2 Literature review

As searcher in image processing area, we fill having a responsibility to contribute on non-propagation of COVID-19 and face mask detection is one method that we can propose for humanity purpose. Most of new publication research on this domain is focusing on deep learning which had a complex architecture because they use a CNN to have a best accuracy rate, so in front of that we decide to make a challenge by suggesting an approach without machine learning for non-complex scenario to compete the existing studies. The complicated case is always managed by a light convolutional neural network.

In 2019, during the first apparition of COVID-19 in Madagascar, the government mobilizes the military in the aim to make an indirect pressure to the population in the road or public place to wear a face mask. The objective is to consider it as normal gesture and part of habitual life. This military activity can be automated with intelligent system and they can focus on their main occupation. It is a benefit for the government to avoid resource wastage.

We opt PCNN for two reasons: first, it extracts easily the important information present in image without training or succession of convolution/spooling operations and a decision can be taken directly if needed. Secondly, it has an ability to form the image signature. Note that PCNN neural network has a static architecture independently of type of image to be processed.

In this section, we list the work jugged pertinent and related to this approach by ascending chronological order.

Sammy V. Militante, Nanette V. Dionisio [22] (2020) published a paper entitled “Real-Time Facemask Recognition with Alarm System using Deep Learning”. This work introduces deep learning techniques VGG-16 CNN to distinguish facial recognition and recognize if a person puts or not a facemask. The accuracy result is around 96% trained with dataset which contains 25,000 images using 224 × 224 pixel resolution. The system develops a Raspberry Pi-based real-time facemask recognition that alarms and captures the facial image if the person detected is not wearing a facemask. A part of camera and speaker, the Raspberry Pi block is composed by facemask detection module, loading model, classification and google speech.

Loey, Mohamed, et al. (2021) [19] did a research in facemask detection “A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic”. The proposed study combines two methods: deep learning and classical machine learning. Normally, a standard CNN is composed by feature extraction layers and classification layer. Authors kept the feature extraction layers of Resnet50 Convolutional Neural Network and fully connected neural networks is swapped by the combination of decision trees, Support Vector Machine (SVM) and ensemble algorithm. The approach was validated with three datasets: Real-World Masked Face Dataset (RMFD), Simulated Masked Face Dataset (SMFD), and Labeled Faces in the Wild (LFW). The SVM classifier achieved 99.64% testing accuracy in RMFD. In SMFD, it achieved 99.49%, while in LFW, it achieved 100% testing accuracy.

Loey, Mohamed, et al. (2021) [20] same authors as previous work published a paper entitled “Fighting against COVID-19: A novel deep learning model based on YOLO-v2 (used for real time object detection) with ResNet-50 (50-layer convolutional neural network (48 convolutional layers, one MaxPool layer, and one average pool layer)) for medical face mask detection”. The concept remains the same that’s to say that the feature extraction is ensured by spooling and convolution layers of Resnet50 and the next step consists to detect medical face mask with YOLO-v2. They merge two medical face masks datasets to validate the method. To improve the object detection process, mean IoU has been used to estimate the best number of anchor boxes. The achieved results concluded that the adam optimizer achieved the highest average precision percentage of 81% as a detector. RestNet50 and YOLO-v2 take an important role in proposed detector model because after CNN feature extraction, YOLO-v2 aliments the network training module in the aim to help evaluate detector module for decision.

Bhawna R., Udit U. (2021) [6] proposed a method face mask detection, “Face Mask Detection Using Convolutional Neural Network”. There are three steps to follow such as images preprocessing, extraction of pertinent information and classification. A standard tool for machine learning was used like Scikit-Learn, Keras, OpenCV and TensorFlow. The experience was performed with 1376 images taken from GitHub without and with mask. 90% of this dataset is used for training and 10% for validation. The model generates higher accuracy which was 99.1% whereas validation lost 87.5% and validation accuracy 97.3%.

Riya C., Rutva J. (2021) [25] published an article entitled “Detection of Face Mask using Convolution Neural Network”. They are using python libraries and MobileNet-v2 as convolutional neural network which is playing the role of transfer learning. The advantage is that there is some pretrained model for new model training. It can reduce the processing time. To train, validate and test the concept, they utilized the dataset that consisted of 993 masked faces pictures and 1918 exposed faces pictures. These pictures were taken from different assets like Kaggle and RMFD datasets. The success rate of person wearing mask detection is around 96% depending on the digital capabilities.

Gagandeep K., Ritesh S., et al. (2021) [12] present a research result entitled “Face mask recognition system using CNN model”. The study is based on fundamental machine learning using a known tool like Scikit-Learn, TensorFlow, OpenCV and Keras. Due of parameter value optimization of Convolutional Neural Network (CNN), the technique has a good performance. Five steps are followed to attain this result: data collection (collect 2 datasets with mask and no mask then labelling data), preprocessing (resize/convert image size, preprocess the data using mobilenet_v2 and perform hot encoding on labels), data splitting (split data for training and testing), building model (construct the raining image generator for augmentation based on mobilenet_v2 with “imagenet” weight, adding model parameters and saving/compiling model), testing model (make prediction on the testing set, plot training loss and accuracy and model evaluation), implementation model (read video by frame and resize, call preprocessing function and predict input data from saved model). The dataset used was downloaded from Kaggle.com website containing 3832 images divided into two mask classes (1914 with mask and 1918 images without mask) and the performance judged high.

Goyal, H., Sidana, K., Singh, C. et al. (2022) [15] proposed an approach of face mask detection “A real time face mask detection system using convolutional neural network” based on Convolutional Neural Network (CNN). The proposed model has three modules: Data pre-processing (load the images, label images into its class, pre-process the image, split the dataset into training and testing and apply data augmentation to the images), CNN model training (RELU as activation function for hidden layers, add 5 conv. Layer/add 5 max pooling layers, insert flatten and dense layers, use Softmax for output layer, train model on training dataset and use adam optimizer, same the model) and Applying face mask detector (load the saved model, detect faces using face detector from OpenCV, pass as input to facemask detection model and decide if person wears a mask or not). This technique is validated using Kaggle dataset and the accuracy reaches 98%, a considerable improvement compared with MobileNet-V2, DenseNet-121, Inception-V3 and VGG-19.

To address the problem of face mask recognition in facial images, the authors (2022) [2] present another method of facemask detection using Depth-wise Separable Convolution Neural Network based on MobileNet (DWS-based MobileNet) instead of 2D convolution layers. The model decreases the number of parameters involved in training by adopting a lightweight network. The first step is to collect images with and without mask and process then in image processing module. Secondly, the depth-separable convolutional neural based on MobileNet takes over the flow and sending the output to model training/testing module before prediction. The overall accuracy is around 93% employing two different datasets: AIZOO and Moxa3K.

3 Proposed model

The system architecture of proposed model (Fig. 1) has three parts:

  1. 1.

    geometric module composed by eyes detection and mask filter. Once we have eyes position, we can do a projection to get the ROI of mask to form mask filter. This mask filter allows us to get the important region of human face which will be processed by the next module.

  2. 2.

    feature extraction module is the combination of PCNN which extracts the pertinent information and statistical analysis. This submodule introduces the notion of GU, GC, E, CE and OM. The evaluation of these parameters helps decision module to determinate if the person is wearing mask or not.

  3. 3.

    decision module decides immediately whether the input of feature extraction module is non-complex otherwise the FCNN does the computation.

Fig. 1
figure 1

Face mask detection method

Before going to first module, the input image must undergo to preprocessing by converting it to gray image then resize. Resizing is very important to guarantee the performance of geometric module.

3.1 Geometric module

3.1.1 Eyes detection

In Matlab, a cascade object detector is available which uses the Viola-Jones algorithm. It has an ability to detect easily people’s faces, noses, eyes, mouth or upper body. Our interest is in eyes detection and we share below the code in Fig. 2 that we use for this submodule.

Fig. 2
figure 2

Eyes detection code in Matlab

Where “imgray” is the matrix of input gray image and “imcolor” is the resized original color image. Let’s take two images from face mask data set shown in Fig. 3 available in data-fair.training website. Applying the code, we have the images in Fig. 4.

Fig. 3
figure 3

Sample experimental input image

Fig. 4
figure 4

Eyes detection using Voila-Jones method

In general, the face mask should be under eyes region so we must find a method to isolate the region of interest (ROI) with mask filter.

3.1.2 Mask filter

Now we have eyes positions. Consider A1(x1, y1) and A2(x2, y2) the extreme points of eyes. First, we calculate the middle of the segment [A1A2] and draw a line (∆) which is parallel with [A1A2] and perpendicular to a line pass in middle of [A1A2], perpendicular to [A1A2]. Second, we calculate the orthogonal projection of the internal points of eyes region through (∆) then we get two points A3 and A4. The mask interest region is the region R formed by {A1, A2, A3, A4}. Details presentation is in Fig. 5.

Fig. 5
figure 5

ROI method detection

Applying this theory in our experimental images, we have the following results (Fig. 6).

Fig. 6
figure 6

ROI marking

Isolation of ROI in image gray original is the last step in mask filter. All regions outside of ROI are considered as black (gray level zero) then we have the output in Fig. 7.

Fig. 7
figure 7

ROI isolation

3.2 Feature extraction module

3.2.1 Pulse couple neural network

The Pulse Couple Neural Network (PCNN) was invented in 1990 and recognized as powerful tool in image processing especially for image segmentation and image edge detection [7]. It has two types of input: feeding which is the gray level of a direct pixel, linking which is carrying the neighboring neuron information and the weights M and W are laterally connected. According to the Fig. 8, a PCNN is divided into three parts: inputs, linking and pulse generator which defines the threshold Θ in order to compare with the internal activity of neuron U to decide the value of the output Y (1 if neural activity is above the threshold and 0 otherwise).

Fig. 8
figure 8

PCNN diagram

The equation of each module is listed in Table 2 as following from Eqs. (1) to (5) and the advantage of PCNN is that a training is not required, only an iteration is enough.

Table 2 Equation list of each PCNN component

Where Sij,(αL, αF, αθ, VF, VL, Vθ) are respectively the external stimulus and (normalized constants) [23]. The parameters value used during this study is given in Table 3 below and the constant synaptic weights M and W are defined in Eq. (6). They are chosen according to the research done by Xiangyu D. and Ma Y. [29] “how to choose PCNN parameter values to have a good performance” then we applied it to our technique.

$$ M=W=\left[\begin{array}{c}0.5\\ {}1.0\\ {}0.5\end{array}\kern0.5em \begin{array}{c}1.0\\ {}1.0\\ {}1.0\end{array}\kern0.5em \begin{array}{c}0.5\\ {}1.0\\ {}0.5\end{array}\right] $$
(6)
Table 3 Parameters of PCNN in the experiments

The input image is initial values of L, F and S matrix. The convolution between null matrix which has the same size as the input image R-by-C and weights matrix initiates the output value Y of PCNN. The first value of dynamic threshold Θ is an R-by-C matrix of two.

As PCNN input image is simple which is the ROI then 20 iterations are enough to obtain a good segmented image. For each iteration, PCNN generates a binary image. Johnson [17, 18] constructed such a transform, a kind of sum operation for each output binary image of the PCNN, and then a one-dimensional time series G[n] is obtained using Eq. (7), where Y[n] is the output binary image of the PCNN at the nth iteration.

$$ G\left[n\right]=\sum \limits_{i,j}{Y}_{i,j}\left[n\right] $$
(7)

From time series graph in Fig. 9 for each iteration, visually we are able to classify an image with mask and no mask. The time signature of no mask is less than with mask. We cannot stay on this statement, we should find a solution to make all processing automatic instead of interpreting. Both figures below (Figs. 10 and 11) represent the image segmented for each iteration. The problem that we solve is to select one image in which contains an important information.

Fig. 9
figure 9

Time series graph of experimented images in Fig. 3

Fig. 10
figure 10

PCNN output series without mask

Fig. 11
figure 11

PCNN output series with mask

3.2.2 Statistical analysis

We need to identify the best segmented image during iteration. For that, four parameters are considered to evaluate the quality of image segmentation: Gray-level Uniformity (GU), Gray-level Contrast (GC), Entropy (E) and Cross-Entropy (CE).

  • Gray-level Uniformity

For a gray-level image (x, y), let Ri be ith segmented region, Ai be the area of Ri (that is the number of pixels contained in region Ri) and C be the normalization factor, then the gray-level uniformity measure of f(x, y) [24] is defined as in Eq. (8).

$$ GU=1-\frac{1}{C}\sum \limits_i\sum \limits_{\left(x,y\right)\in {R}_i}{\left[f\left(x,y\right)-\frac{1}{A_i}\sum \limits_{\left(x,y\right)\in {R}_i}f\left(x,y\right)\right]}^2 $$
(8)
  • Gray-level Contrast

In a gray-level image f(x, y) consisting of the object with the average graylevel fO and the background with the average gray-level fb [24], a gray-level contrast measure can be calculated using Eq. (9).

$$ GC=\frac{\left|{f}_O-{f}_b\right|}{f_O+{f}_b} $$
(9)
  • Entropy

E which is presented in Eq. (10), is the information entropy of binary image; P1 is the probability of 1’s in a binary image, and P0 is that of 0’s in the binary image [10].

$$ E=-{P}_1{\mathit{\log}}_2\left({P}_1\right)-{P}_0{\mathit{\log}}_2\left({P}_0\right) $$
(10)
  • Cross-Entropy

Cross-Entropy in Eq. (12) is a measure of information difference between two probability distributions (Eq. (11)). The minimum cross entropy principle is used in image segmentation to search for the threshold which results in the least information difference between the two images before and after segmentation [24].

For two probability distributions P = {p1, p2, …, pN} and Q = {q1, q2, …, qN}, their cross-entropy is defined as

$$ D\left(P,Q\right)=\sum \limits_{i=1}^N{p}_i\mathit{\ln}\frac{p_i}{q_i}+\sum \limits_{i=1}^N{q}_i\mathit{\ln}\frac{q_i}{p_i} $$
(11)

In image segmentation, P and Q represent the original image and the segmented one respectively. Then

$$ {\displaystyle \begin{array}{c}C\left(P,Q,:t\right)=\sum \limits_{f=1}^t\left[f.h(f).\mathit{\ln}\frac{f}{\mu_1(t)}+{\mu}_1(t).h(f).\mathit{\ln}\frac{\mu_1(t)}{f}\right]\\ {}+\sum \limits_{f=t+1}^F\left[f.h(f).\mathit{\ln}\frac{f}{\mu_2(t)}+{\mu}_2(t).h(f).\mathit{\ln}\frac{\mu_2(t)}{f}\right]\end{array}} $$
(12)

where, f is the gray-level, F is the maximal value of f, t is the threshold of segmentation and h(f) is the histogram of original image. μ1 and μ2 are the average gray-level of objects and a background, respectively. They can be calculated by h(f), shown as in Eqs. (13) and (14) [24].

$$ {\mu}_1(t)=\frac{1}{\sum_{f=0}^th(f)}\sum \limits_{f=0}^tf.h(f) $$
(13)
$$ {\mu}_2(t)=\frac{1}{\sum_{f=t+1}^Fh(f)}\sum \limits_{f=t+1}^Ff.h(f) $$
(14)

We calculate GU, GC, CE, E and OM in Eq. (15) for both experimental images. The result for each iteration is shown in Tables 4 and 5.

$$ OM=\left[ GU+ GC+\left(1- CE\right)+E\right]/4 $$
(15)
Table 4 Experimental values of GU/GC/CE/E/OM of Fig. 9 (without mask)
Table 5 Experimental values of GU/GC/CE/E/OM of Fig. 10 (with mask)

To select an excellent segmented image, the “Entropy must be maximum” or the “Cross Entropy must be minimum” or “Overall Merit must be maximum”. We check them from tables and if an iteration respects more than one criteria, the corresponding image is the image that we use for the next processing step. If we have three image candidates, we choose directly the one which has a maximum OM.

Four images meet the condition during our experience as per the information in Table 6. So, we select the image n°17 in Fig. 9 and image n°15 in Fig. 10. In summary, these measurements are used to find a best quality of image segmentation and edge detection.

Table 6 Experimental choice of segmented images

3.3 Decision

In normal case, mouth and nose region should be present whether people don’t wear a facemask and with mask, in maximum one region is there depending on mask’s type. For standard face mask, there is no special region but directly the background. Depending on the number of regions present in ROI, the decision is made. We count the number of regions present and if it is equal or more than two, the processing flow continues with deep learning otherwise we judge that the person has a mask.

In event of deep learning, the FCNN manage the treatment. At least, five hidden layers are required and, on this paper, we fix it to five. The activation function for them is sigmoid function and all weights are initialized randomly. Concerning, output layer, the number of neurons is two (with mask and without mask). The neuron which has a high probability value determinate the image with facemask. The activation function Softmax ensures this probability format. We have ten neurons in input layer composed by {GU1, GC1, CE1, Entropy1, OM1, GU2, GC2, CE2, Entropy2, OM2 }. The Fig. 12 helps us to understand this explanation. The percentage of image allocated for testing depends on the searcher choice but it is important to have a percentage training dataset more than testing images. The following paragraph details the training and the datasets description which are publicly available.

Fig. 12
figure 12

FCNN architecture

3.4 Datasets training and testing

For dataset training/testing, six type of dataset which are used by authors in [2, 6, 12, 15, 20, 25] are the image database that we test our own method. The reason of this choice is to compare related works with same reference data. The Table 7 describes the content of all dataset used during this work and a sample of image is presented in Fig. 13.

Table 7 Dataset description
Fig. 13
figure 13

Preview datasets

The number of epochs for each dataset is limited to 80 and we remark that from 65th epoch, the result commences acceptable except for dataset5. The trend of training and validation is shown in Fig. 14.

Fig. 14
figure 14

Accuracy graph

4 Performance comparison and discussion

The proposed method can work without deep learning in case of simple image however a neural network training is required for complex situation. It means that the system that we propose here is a lite version.

Comparing with related works in [6, 20, 25], we have an accuracy better. We have a bit difficulty to decide when the image test is from MMD because some medical face mask has a complicated form and some image has a multiple face without and with mask. Due of this reason, the accuracy is less than 84%. The accuracy comparison is presented in Table 8.

Table 8 Accuracy comparison

We have a best performance with dataset1 because the face mask worn has a similar model. According to the research done by Bhawna R. & Udit U. [6], they need 90% of dataset for training and 10% for testing. The accuracy measurement is done with 10% and they got 99.1%. By applying our algorithm in the same dataset without deep learning, we obtain 99.6% for base training and 100% for test which is perfect.

For dataset2, the experience is the same as with dataset1. The bit difficulty we were facing is about mask face multicolor which create multiple region. About computation time, this approach needs around 2.36 milliseconds to decide for one image however 2.5 second is the classification time of 1020 images for Asghar M.Z et al. (2022) [2] model (2.45 milliseconds per image).

Before closing this paragraph, we share in the Table 9 below the advantage/inconvenient of each method and in Table 10, we illustrate another performance indicator like precision, recall and F1 score [1, 3, 14, 21] which are defined in Eqs. (16)–(18).

$$ Precision=\frac{Tp}{Tp+ Fp} $$
(16)
$$ Recall=\frac{Tp}{Tp+ Fn} $$
(17)
$$ F1=\frac{2\ast \left( Precision\ast Recall\ \right)}{Precision+ Recall} $$
(18)

Where Tp, Tn, Fp, Fn are respectively True positive, True negative, False positive and False negative.

Table 9 Advantage and Inconvenient of each technique
Table 10 Another performance metrics comparison

True positives are accurately predicted as being in a positive class, whereas false positives are images that were incorrectly predicted as being in a positive class. True negatives are accurately predicted to be in the negative class, whereas false negatives are incorrectly predicted to be in the negative class [15].

We can see that our method leads the performance especially with dataset 4.

5 Conclusion

In this research paper, we proposed a method to detect whether people wear face mask or expose face. The objective is to slow down the spread of COVID-19 and we contribute to elaborate a system to trigger an alarm if a person is coming without mask in particular public place. The technical is based on combination of PCNN and FCNN neural networks and divided into three modules: geometric module which detects the eyes and from eyes position, we deduct the mask filter to get the region of interest; feature extraction module in which the PCNN extracts the pertinent information by selecting one image well segmented and one image with edge detection and applies some criteria to select the final image candidate that we count the number of region; decision module analyses the number of region and if it is equal to one, the FCNN doesn’t need to intervene overwise the decision is taken by pretrain neural network. Due of this smart conception, the time response is minimal. Concerning the accuracy, we validate our research with different dataset such as Kaggle, AIZOO, Moxa3K, Real-World Masked Face Dataset, Medical Masks Dataset, Face Mask Dataset and the average accuracy is around 86.68% which varies from 83.2% to 100%. Due to this performance, the approach can detect the presence or absence of mask in human visage and contributes on spread reduction of COVID-19 because once people break the imposed rule, an alarm is triggered and these access to a specific place are blocked. Therefore, the presence of many security agent or military is not required and they can handle other task. Meanwhile, the suggested technique has a weakness when a several visages appear on single image and it may be improved by detecting the upper body then eyes instead of eyes detection directly. In future work, we would like to build a system able to check whether a facemask is worn incorrectly and taking care the physical distance between people.