2103.02805v1
2103.02805v1
2103.02805v1
Fig. 2: Samples of occlusion face images in Webface-OCC. The first row shows normal faces, and the second and third rows are their
corresponding occlusion faces.
B. Implementation Details
For all face recognition models that need to be retrained, we
employ the refined ResNet50 model proposed in ArcFace [3]
as our baseline CNN model. Our implementation is based on
Pytorch deep learning framework, running on two NVIDIA
2080ti (12GB) GPUs. In training, the batch size is set to
128, and the training process is finished at 32K iterations.
We extract the 512-dimension features for each normalized
face in testing. To prevent over-fitting and improve the trained
Fig. 4: The distribution of the Webface-OCC dataset for various
mask types. models’ generalization, we perform data augmentation on the
training set, like flipping.
face image can have effective recognition features, and avoid The open-source RetinaFace [18] is used to detect occlusion
large-area occlusions that cause the image to be completely faces from the raw images and obtain 68 facial landmarks.
unrecognizable. After performing similarity transformation accordingly, we
align the face image and resize them to 112 × 112 pixels.
In view of the limitations of the occluder template, we
improve the subjective diversity of occluders by adding various
textures (about 30 types) in real life to the known template. C. Results on Face Verification
Multiple occlusion types avoid the deep learning model from
We evaluate the models trained on our dataset strictly
being too sensitive to the types of occluders.
following the standard protocol of unrestricted with labeled
outside data [5] and report the mean accuracy on test image
III. E XPERIMENTS pairs.
As shown in Table I, * means that the model is retrained by
A. Experimental Settings
our dataset. We divide the methods into two categories, includ-
Datasets. In experiments, we use the large CASIA-Webface ing general face recognition and occlusion face recognition.
dataset [10] and our built Webface-OCC dataset for training As shown results, due to the influence of poses and ages, the
and other face datasets (LFW [5], CFP-FP [12], AgeDB-30 accuracy on CFP-FP and AgeDB-30 is far lower than LFW.
[13], as well as the recently proposed mask face recognition Obviously, the general models trained by Webface perform
datasets (LFW-mask, CFP-FP, AgeDB-30-mask, RMFRD) well on general face images. As a occlusion face recognition
[14] for testing. Webface is a large-scale face recognition model, MaskNet has a significant reduction in the effect on the
dataset up to 10,000 subjects and 500,000 faces, thus suitable general face recognition dataset due to the additional occlusion
for model training. LFW is a public face verification bench- elimination operations. It is worth noting that the model trained
mark dataset under unconstrained conditions. CFP-FP contains on our dataset still has an outstanding performance on the
7000 images of 500 identities, each with 10 frontal and 4 non- general face recognition dataset, which is only about 1% less
frontal images. AgeDB-30 covers subjects of different ages. accurate relative to the original model.
The LFW-mask, CFP-FP-mask and AgeDB-30-mask dataset In comparison, the recognition accuracy in masked face
are the results of adding masks to the original datasets, and recognition datasets has greatly shown the superiority of our
the data size and scale remain unchanged. RMFRD dataset dataset. In further examination, the retrained models still
contains 4015 face images of 426 people in the size of significantly outperform the original models (FaceNet and
250×250 pixels, each with a normal face and several masked ArcFace). Specifically, relative to the original model ArcFace,
face images. The dataset is further organized into 7178 masked the accuracy of retrained model on the four masked face
and non-masked sample pairs, including 3589 pairs of the recognition datasets has risen by 36.22%, 29.14%, 27.04%
same identity and 3589 pairs of different identities. and 15.03%, respectively. This is really a remarkable gain in
Benchmarks. To validate the effect of occlusion face recog- the face recognition task. Experiments show that our retrained
nition on the existing mask face recognition dataset and pro- model has a significant improvement in the performance of
vide evaluation reference for researchers using the dataset, we the occlusion face recognition dataset, without reducing the
retrain and compare six different CNN-based models on face original effect on the general face recognition dataset.
recognition. The six baseline face recognition models include Simultaneously, compared to the test on simulated masked
CenterFace [15], SphereFace [2], FaceNet [1], CosFace [16] face images, the recognition accuracy of all methods on the
and ArcFace [3], along with a occlusion face recognition real masked face dataset is significantly reduced. Specifically,
model MaskNet [17]. FaceNet and Arcface will be retrained The large gap between the simulated masked faces and real
into two versions using public WiderFace dataset [10] and our ones can be attributed to the following facts. For the real-world
built dataset, respectively. masked faces, it is difficult to distinguish unknown occlusions
4
TABLE I: Comparisons on face verification (%) on LFW, CFP-FP, AgeDB-30 and RMFRD datasets. * denotes a retrained version.
Method LFW [5] CFP-FP [12] AgeDB-30 [13] LFW-mask [14] CFP-FP-mask [14] AgeDB-30-mask [14] RMFRD [14]
CenterFace [15] - - - 56.63 56.35 57.12 -
SphereFace [2] 99.11 94.38 91.70 58.22 57.11 57.02 -
FaceNet [1] 99.05 94.12 91.26 59.68 57.36 57.89 -
CosFace [16] 99.51 95.44 94.56 60.70 58.18 58.14 61.06
ArcFace [3] 99.53 95.56 95.15 60.86 58.04 59.03 63.22
MaskNet [17] 93.86 80.56 84.23 83.22 75.63 72.62 68.78
FaceNet(*) 97.98 93.20 90.19 95.87 85.63 84.01 77.56
ArcFace(*) 99.01 93.58 93.27 97.08 87.18 86.07 78.25
Fig. 5: ROC curves of face verification on LFW-mask, AgeDB-30-mask and CFP-FP-mask datasets.
accurately, and the dataset itself mostly comes from public [3] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular
figures, which deliberately disguise to avoid revealing identity. margin loss for deep face recognition,” in IEEE Conference on Computer
Vision and Pattern Recognition, 2018, pp. 4685–4694.
To further verify the performance of our model, we then [4] B. Liu, W. Deng, Y. Zhong, M. Wang, J. Hu, X. Tao, and Y. Huang, “Fair
evaluate the results in the ROC indicator. The two retrained loss: Margin-aware reinforcement learning for deep face recognition,” in
models are selected for display, where the results on LFW- IEEE International Conference on Computer Vision, 2019, pp. 10 052–
10 061.
mask, CFP-FP-mask and AgeDB-30-mask datasets are shown [5] T. B. Gary B. Huang, Marwan Mattar and E. Learned-Miller, “Labeled
in Fig. 5. Again, our retrained model is able to maintain a faces in the wild: A database for studying face recognition in uncon-
certain accuracy and stability when the FPR (false positive strained environments,” in Workshop on Faces in ’Real-Life’ Images:
Detection, Alignment, and Recognition, 2008.
rate) is more than 1e-3. Admittedly, the accuracy drops sharply [6] E. J. He, J. A. Fernandez, B. V. K. V. Kumar, and M. I. Alkanhal,
when the FPR is less than 1e-3, mainly due to the incomplete “Masked correlation filters for partially occluded face recognition,” in
facial features. International Conference on Acoustics, Speech, and Signal Processing,
2016, pp. 1293–1297.
[7] L. Song, D. Gong, Z. Li, C. Liu, and W. Liu, “Occlusion robust face
recognition based on mask learning with pairwise differential siamese
IV. C ONCLUSION network,” in IEEE International Conference on Computer Vision, 2019,
This research proposes a simulated occlusion face recog- pp. 773–782.
[8] S. Ge, C. Li, S. Zhao, and D. Zeng, “Occluded face recognition in the
nition dataset. We specially design an occlusion synthesis wild by identity-diversity inpainting,” IEEE Transactions on Circuits
method, and apply it to the existing Webface dataset, thus and Systems for Video Technology, 2020.
obtaining a large number of occluded face images. Fur- [9] Y. Shen, J. Gu, X. Tang, and B. Zhou, “Interpreting the latent space
of gans for semantic face editing,” in IEEE Conference on Computer
thermore, we reasonably combine the original normal face Vision and Pattern Recognition, 2020, pp. 9240–9249.
image and the occluded face image to get our final dataset. [10] D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Learning face representation from
Experimental results show that ArcFace retrained by our scratch,” arXiv: Computer Vision and Pattern Recognition, 2014.
[11] A. Bulat and G. Tzimiropoulos, “How far are we from solving the 2d
dataset can respectively give 97.08% and 78.25% accuracy & 3d face alignment problem? (and a dataset of 230,000 3d facial
on the simulation face datasets and the real-world masked landmarks),” in IEEE International Conference on Computer Vision,
face dataset, substantially outperforming the counterparts. In 2017, pp. 1021–1030.
[12] S. Sengupta, J. Chen, C. D. Castillo, V. M. Patel, R. Chellappa, and D. W.
the future, we will further develop a universal occlusion face Jacobs, “Frontal to profile face verification in the wild,” in Workshop on
recognition algorithm on this basis. Applications of Computer Vision, 2016, pp. 1–9.
[13] S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, and
S. Zafeiriou, “Agedb: The first manually collected, in-the-wild age
R EFERENCES database,” in IEEE Conference on Computer Vision and Pattern Recog-
nition, 2017, pp. 1997–2005.
[1] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified em- [14] Z. Wang, G. Wang, B. Huang, Z. Xiong, Q. Hong, H. Wu, P. Yi, K. Jiang,
bedding for face recognition and clustering,” in IEEE Conference on N. Wang, and Y. Pei, “Masked face recognition dataset and application,”
Computer Vision and Pattern Recognition, 2015, pp. 815–823. arXiv preprint arXiv:2003.09093, 2020.
[2] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface: Deep [15] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature
hypersphere embedding for face recognition,” in IEEE Conference on learning approach for deep face recognition,” in European Conference
Computer Vision and Pattern Recognition, 2017, pp. 6738–6746. on Computer Vision, 2016, pp. 499–515.
5
[16] H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu,
“Cosface: Large margin cosine loss for deep face recognition,” in IEEE
Conference on Computer Vision and Pattern Recognition, 2018, pp.
5265–5274.
[17] W. Wan and J. Chen, “Occlusion robust face recognition based on mask
learning,” in IEEE International Conference on Image Processing, 2017,
pp. 3795–3799.
[18] J. Deng, J. Guo, Z. Yuxiang, J. Yu, I. Kotsia, and S. Zafeiriou,
“Retinaface: Single-stage dense face localisation in the wild,” arXiv:
Computer Vision and Pattern Recognition, 2019.