Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

2103.02805v1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

1

When Face Recognition Meets Occlusion: A New


Benchmark
Baojin Huang, Zhongyuan Wang, Guangcheng Wang, Kui Jiang, Kangli Zeng, Zhen Han, Xin Tian, Yuhong Yang

Abstract—The existing face recognition datasets usually lack


occlusion samples, which hinders the development of face recog-
nition. Especially during the COVID-19 coronavirus epidemic,
wearing a mask has become an effective means of preventing
arXiv:2103.02805v1 [cs.CV] 4 Mar 2021

the virus spread. Traditional CNN-based face recognition models


trained on existing datasets are almost ineffective for heavy
occlusion. To this end, we pioneer a simulated occlusion face
recognition dataset. In particular, we first collect a variety of
glasses and masks as occlusion, and randomly combine the
occlusion attributes (occlusion objects, textures,and colors) to
achieve a large number of more realistic occlusion types. We
then cover them in the proper position of the face image with
the normal occlusion habit. Furthermore, we reasonably combine
original normal face images and occluded face images to form
our final dataset, termed as Webface-OCC. It covers 804,704
face images of 10,575 subjects, with diverse occlusion types Fig. 1: Some samples of training and testing images in recent
to ensure its diversity and stability. Extensive experiments on occlusion face recognition papers. (a) MaskNet [6], (b) PSDN [7],
public datasets show that the ArcFace retrained by our dataset (c) wID [8], (d) Ours.
significantly outperforms the state-of-the-arts. Webface-OCC is
available at https://github.com/Baojin-Huang/Webface-OCC.
Index Terms—Face recognition, occlusion face dataset, occlu-
sion simulation masked face images with three types of occlusions. Although
it increases the diversity of occlusion types, it is far from the
I. I NTRODUCTION actual occlusion situations (block position and size). Recently,
to generate occluded faces, Ge et al. [8] take a m × m
With the development of face recognition technologies (e.g., m = 48) mask to cover on real face images randomly.
based on deep learning, many face recognition methods [1], Obviously, this synthesis method is too blunt to adapt to actual
[2], [3], [4] under normal scenes have achieved impressive recognition needs. The GAN-based method [9] can generate
performance, even exceeding the human recognition ability visually natural occluded face images, but objectively changes
on the benchmark dataset [5]. However, when the face image the image’s detailed information. The recognizer trained by
is occluded in the actual unrestricted scene, the recognition these images does not perform well in real scenes. In brief,
accuracy drops sharply. To eliminate the influence of occlusion to synthesize occlusion images, most of the current methods
on face recognition accuracy, researchers mostly train deep cover some abnormal occlusion masks to the normal image,
networks to be ”familiar” with the occluded area of face im- deviating from the actual situation.
ages, thereby weakening or inpainting occlusion components.
Note that there is currently no open-source occlusion face To address these drawbacks, we propose a Webface-OCC
recognition dataset. The existing occluded face recognition dataset to improve the performance of occluded face recog-
methods based on deep learning all synthesize their own nition in real scenes. In the Webface-OCC dataset, we fully
occluded face images to train the network. consider the occlusion type and position and synthesize the
As shown in Fig. 1, we list some samples of occluded occluded face image, as shown in Fig. 1 (d). In particular, we
images in recent occlusion face recognition papers. Wan first collect a variety of occlusion types of glasses and masks,
et al. [6] synthesized occlusion images with random black and randomly combine the occlusion attributes (occlusion
block sizes n = 40, 50, 60, 70, respectively. Only a single objects, textures,and colors) to achieve a large number of
occlusion type is unnatural and not robust. Song et al. [7] more realistic occlusion types. We then cover them in the
proper position of the face image with the normal occlusion
B. Huang, Z. Wang, G. Wang, K. Jiang, K. Zeng, H. Zhen, T. Xin, Y. habit. Besides, we design a reasonable combination method for
Yang are with the National Engineering Research Center for Multimedia
Software, School of Computer Science, Wuhan University, Wuhan 430072, simulating occlusion of face images. Extensive experimental
China. Thanks to National Natural Science Foundation of China (U1903214, results on simulated and real-world masked face datasets
U1736206, 62071339, 62072347,61971315, 62072350) for funding. The nu- confirm our built dataset’s substantial superiority in terms of
merical calculations in this paper have been done on the supercomputing
system in the Supercomputing Center of Wuhan University. (Corresponding accuracy without reducing the original effect on the general
author: Zhongyuan Wang, wzy hope@163.com). face recognition dataset.
2

Fig. 2: Samples of occlusion face images in Webface-OCC. The first row shows normal faces, and the second and third rows are their
corresponding occlusion faces.

II. PROPOSED OCCLUSION FACE RECOGNITION


DATASET
The CASIA-Webface [10] dataset, as a common face recog-
nition dataset, contains a small part of occlusion samples.
Various CNN-based face recognition models [1], [2], [3],
[4] trained on this dataset can achieve good performance
in recognizing faces with small occlusions. It is conceivable
that to improve the recognition performance of the model on
occluded face images, a large-scale occlusion sample dataset Fig. 3: Texture of occluder (a) and type of occluder (b) in Fig. 2.
is essential. To increase the proportion of occluded samples in
the CASIA-Webface, we combine the key points of the face size of the occluder, and finally make the occluder perfectly
and multiple occlusion types to achieve occlusion simulation fit the normal person face image. This way, we construct a
as much as possible authenticity and diversity. In this section, simulated occlusion face recognition dataset, covering 804,704
we will introduce the process of occlusion simulation, as well face images of 10,575 subjects.
as show the basic statistics of our dataset.

A. Occlusion Simulation B. Statistics


Based on the fact that real face occlusion is to block the After simulating all the 494,414 images, we keep part of
facial features of the face, we cover common occlusion types the original face images to ensure the stability of the dataset
(glasses, masks, etc.) on the normal face image with the help in normal face image recognition. In practice, the simulated
of known key points of the face. We collect several occlusion occlusion face dataset can be used along with their original
templates from various natural scenes to enrich occlusion unmasked counterparts. The dataset contains both occlusion
diversity and authenticity. Meanwhile, we adopt an accurate and normal faces of the identities. The dataset is processed in
face key point detection model to obtain the face image’s key terms of face alignment and image dimensions. Each image
point information. has a dimension of (112 × 112 × 3). The statistics of our
For occlusion types, including occlusion objects, textures, simulated Webface-OCC is listed in Fig. 4. The proportions
and colors, as shown in Fig. 3, these occlusion attributes can be of various masks and glasses are relatively uniform. It is
combined to achieve a large number of more realistic occlusion worth noting that the surgical mask, as a very common mask
types, which improves the occlusion diversity of large-scale occluder, gives more samples in our dataset. Note that the
face datasets. For the key points of the face, we obtain the normal face images account for about half of the samples,
64 landmarks of the normal face by face alignment [11]. as they will be more evenly distributed to each identity face
And at the same time we establish the key point mapping image database. Thus each identity contains occluded and
between the occluder and the face image, adjust the angle and unoccluded face images. At the same time, we ensure that each
3

Evaluation Metrics. We evaluate the test models with


precision and receiver operating characteristic curve (ROC)
for face recognition.

B. Implementation Details
For all face recognition models that need to be retrained, we
employ the refined ResNet50 model proposed in ArcFace [3]
as our baseline CNN model. Our implementation is based on
Pytorch deep learning framework, running on two NVIDIA
2080ti (12GB) GPUs. In training, the batch size is set to
128, and the training process is finished at 32K iterations.
We extract the 512-dimension features for each normalized
face in testing. To prevent over-fitting and improve the trained
Fig. 4: The distribution of the Webface-OCC dataset for various
mask types. models’ generalization, we perform data augmentation on the
training set, like flipping.
face image can have effective recognition features, and avoid The open-source RetinaFace [18] is used to detect occlusion
large-area occlusions that cause the image to be completely faces from the raw images and obtain 68 facial landmarks.
unrecognizable. After performing similarity transformation accordingly, we
align the face image and resize them to 112 × 112 pixels.
In view of the limitations of the occluder template, we
improve the subjective diversity of occluders by adding various
textures (about 30 types) in real life to the known template. C. Results on Face Verification
Multiple occlusion types avoid the deep learning model from
We evaluate the models trained on our dataset strictly
being too sensitive to the types of occluders.
following the standard protocol of unrestricted with labeled
outside data [5] and report the mean accuracy on test image
III. E XPERIMENTS pairs.
As shown in Table I, * means that the model is retrained by
A. Experimental Settings
our dataset. We divide the methods into two categories, includ-
Datasets. In experiments, we use the large CASIA-Webface ing general face recognition and occlusion face recognition.
dataset [10] and our built Webface-OCC dataset for training As shown results, due to the influence of poses and ages, the
and other face datasets (LFW [5], CFP-FP [12], AgeDB-30 accuracy on CFP-FP and AgeDB-30 is far lower than LFW.
[13], as well as the recently proposed mask face recognition Obviously, the general models trained by Webface perform
datasets (LFW-mask, CFP-FP, AgeDB-30-mask, RMFRD) well on general face images. As a occlusion face recognition
[14] for testing. Webface is a large-scale face recognition model, MaskNet has a significant reduction in the effect on the
dataset up to 10,000 subjects and 500,000 faces, thus suitable general face recognition dataset due to the additional occlusion
for model training. LFW is a public face verification bench- elimination operations. It is worth noting that the model trained
mark dataset under unconstrained conditions. CFP-FP contains on our dataset still has an outstanding performance on the
7000 images of 500 identities, each with 10 frontal and 4 non- general face recognition dataset, which is only about 1% less
frontal images. AgeDB-30 covers subjects of different ages. accurate relative to the original model.
The LFW-mask, CFP-FP-mask and AgeDB-30-mask dataset In comparison, the recognition accuracy in masked face
are the results of adding masks to the original datasets, and recognition datasets has greatly shown the superiority of our
the data size and scale remain unchanged. RMFRD dataset dataset. In further examination, the retrained models still
contains 4015 face images of 426 people in the size of significantly outperform the original models (FaceNet and
250×250 pixels, each with a normal face and several masked ArcFace). Specifically, relative to the original model ArcFace,
face images. The dataset is further organized into 7178 masked the accuracy of retrained model on the four masked face
and non-masked sample pairs, including 3589 pairs of the recognition datasets has risen by 36.22%, 29.14%, 27.04%
same identity and 3589 pairs of different identities. and 15.03%, respectively. This is really a remarkable gain in
Benchmarks. To validate the effect of occlusion face recog- the face recognition task. Experiments show that our retrained
nition on the existing mask face recognition dataset and pro- model has a significant improvement in the performance of
vide evaluation reference for researchers using the dataset, we the occlusion face recognition dataset, without reducing the
retrain and compare six different CNN-based models on face original effect on the general face recognition dataset.
recognition. The six baseline face recognition models include Simultaneously, compared to the test on simulated masked
CenterFace [15], SphereFace [2], FaceNet [1], CosFace [16] face images, the recognition accuracy of all methods on the
and ArcFace [3], along with a occlusion face recognition real masked face dataset is significantly reduced. Specifically,
model MaskNet [17]. FaceNet and Arcface will be retrained The large gap between the simulated masked faces and real
into two versions using public WiderFace dataset [10] and our ones can be attributed to the following facts. For the real-world
built dataset, respectively. masked faces, it is difficult to distinguish unknown occlusions
4

TABLE I: Comparisons on face verification (%) on LFW, CFP-FP, AgeDB-30 and RMFRD datasets. * denotes a retrained version.
Method LFW [5] CFP-FP [12] AgeDB-30 [13] LFW-mask [14] CFP-FP-mask [14] AgeDB-30-mask [14] RMFRD [14]
CenterFace [15] - - - 56.63 56.35 57.12 -
SphereFace [2] 99.11 94.38 91.70 58.22 57.11 57.02 -
FaceNet [1] 99.05 94.12 91.26 59.68 57.36 57.89 -
CosFace [16] 99.51 95.44 94.56 60.70 58.18 58.14 61.06
ArcFace [3] 99.53 95.56 95.15 60.86 58.04 59.03 63.22
MaskNet [17] 93.86 80.56 84.23 83.22 75.63 72.62 68.78
FaceNet(*) 97.98 93.20 90.19 95.87 85.63 84.01 77.56
ArcFace(*) 99.01 93.58 93.27 97.08 87.18 86.07 78.25

Fig. 5: ROC curves of face verification on LFW-mask, AgeDB-30-mask and CFP-FP-mask datasets.

accurately, and the dataset itself mostly comes from public [3] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular
figures, which deliberately disguise to avoid revealing identity. margin loss for deep face recognition,” in IEEE Conference on Computer
Vision and Pattern Recognition, 2018, pp. 4685–4694.
To further verify the performance of our model, we then [4] B. Liu, W. Deng, Y. Zhong, M. Wang, J. Hu, X. Tao, and Y. Huang, “Fair
evaluate the results in the ROC indicator. The two retrained loss: Margin-aware reinforcement learning for deep face recognition,” in
models are selected for display, where the results on LFW- IEEE International Conference on Computer Vision, 2019, pp. 10 052–
10 061.
mask, CFP-FP-mask and AgeDB-30-mask datasets are shown [5] T. B. Gary B. Huang, Marwan Mattar and E. Learned-Miller, “Labeled
in Fig. 5. Again, our retrained model is able to maintain a faces in the wild: A database for studying face recognition in uncon-
certain accuracy and stability when the FPR (false positive strained environments,” in Workshop on Faces in ’Real-Life’ Images:
Detection, Alignment, and Recognition, 2008.
rate) is more than 1e-3. Admittedly, the accuracy drops sharply [6] E. J. He, J. A. Fernandez, B. V. K. V. Kumar, and M. I. Alkanhal,
when the FPR is less than 1e-3, mainly due to the incomplete “Masked correlation filters for partially occluded face recognition,” in
facial features. International Conference on Acoustics, Speech, and Signal Processing,
2016, pp. 1293–1297.
[7] L. Song, D. Gong, Z. Li, C. Liu, and W. Liu, “Occlusion robust face
recognition based on mask learning with pairwise differential siamese
IV. C ONCLUSION network,” in IEEE International Conference on Computer Vision, 2019,
This research proposes a simulated occlusion face recog- pp. 773–782.
[8] S. Ge, C. Li, S. Zhao, and D. Zeng, “Occluded face recognition in the
nition dataset. We specially design an occlusion synthesis wild by identity-diversity inpainting,” IEEE Transactions on Circuits
method, and apply it to the existing Webface dataset, thus and Systems for Video Technology, 2020.
obtaining a large number of occluded face images. Fur- [9] Y. Shen, J. Gu, X. Tang, and B. Zhou, “Interpreting the latent space
of gans for semantic face editing,” in IEEE Conference on Computer
thermore, we reasonably combine the original normal face Vision and Pattern Recognition, 2020, pp. 9240–9249.
image and the occluded face image to get our final dataset. [10] D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Learning face representation from
Experimental results show that ArcFace retrained by our scratch,” arXiv: Computer Vision and Pattern Recognition, 2014.
[11] A. Bulat and G. Tzimiropoulos, “How far are we from solving the 2d
dataset can respectively give 97.08% and 78.25% accuracy & 3d face alignment problem? (and a dataset of 230,000 3d facial
on the simulation face datasets and the real-world masked landmarks),” in IEEE International Conference on Computer Vision,
face dataset, substantially outperforming the counterparts. In 2017, pp. 1021–1030.
[12] S. Sengupta, J. Chen, C. D. Castillo, V. M. Patel, R. Chellappa, and D. W.
the future, we will further develop a universal occlusion face Jacobs, “Frontal to profile face verification in the wild,” in Workshop on
recognition algorithm on this basis. Applications of Computer Vision, 2016, pp. 1–9.
[13] S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, and
S. Zafeiriou, “Agedb: The first manually collected, in-the-wild age
R EFERENCES database,” in IEEE Conference on Computer Vision and Pattern Recog-
nition, 2017, pp. 1997–2005.
[1] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified em- [14] Z. Wang, G. Wang, B. Huang, Z. Xiong, Q. Hong, H. Wu, P. Yi, K. Jiang,
bedding for face recognition and clustering,” in IEEE Conference on N. Wang, and Y. Pei, “Masked face recognition dataset and application,”
Computer Vision and Pattern Recognition, 2015, pp. 815–823. arXiv preprint arXiv:2003.09093, 2020.
[2] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface: Deep [15] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature
hypersphere embedding for face recognition,” in IEEE Conference on learning approach for deep face recognition,” in European Conference
Computer Vision and Pattern Recognition, 2017, pp. 6738–6746. on Computer Vision, 2016, pp. 499–515.
5

[16] H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu,
“Cosface: Large margin cosine loss for deep face recognition,” in IEEE
Conference on Computer Vision and Pattern Recognition, 2018, pp.
5265–5274.
[17] W. Wan and J. Chen, “Occlusion robust face recognition based on mask
learning,” in IEEE International Conference on Image Processing, 2017,
pp. 3795–3799.
[18] J. Deng, J. Guo, Z. Yuxiang, J. Yu, I. Kotsia, and S. Zafeiriou,
“Retinaface: Single-stage dense face localisation in the wild,” arXiv:
Computer Vision and Pattern Recognition, 2019.

You might also like