Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Vulnerabilities of Unattended Face Verification Systems to Facial Components-based Presentation Attacks: An Empirical Study

Published: 23 November 2021 Publication History

Abstract

As face presentation attacks (PAs) are realistic threats for unattended face verification systems, face presentation attack detection (PAD) has been intensively investigated in past years, and the recent advances in face PAD have significantly reduced the success rate of such attacks. In this article, an empirical study on a novel and effective face impostor PA is made. In the proposed PA, a facial artifact is created by using the most vulnerable facial components, which are optimally selected based on the vulnerability analysis of different facial components to impostor PAs. An attacker can launch a face PA by presenting a facial artifact on his or her own real face. With a collected PA database containing various types of artifacts and presentation attack instruments (PAIs), the experimental results and analysis show that the proposed PA poses a more serious threat to face verification and PAD systems compared with the print, replay, and mask PAs. Moreover, the generalization ability of the proposed PA and the vulnerability analysis with regard to commercial systems are also investigated by evaluating unknown face verification and real-world PAD systems. It provides a new paradigm for the study of face PAs.

1 Introduction

Face verification is a technology that is used for automatically verifying an identity claim by comparing the similarity between a probe face image and a reference face image in the gallery. In the enrollment process, a representation of an authorized user’s facial biometric characteristic is enrolled as a reference in a face verification system. During the verification, a probe face image is captured, and then it is used for identity authentication. Compared with the traditional identity verification approaches like a PIN code (what you know) or a token (what you have), face verification is more secure and user friendly, because it is designed based on the conception of what you are meaning that you cannot forget or delegate the authentication factor. Due to the advent of deep learning, face recognition performance has been significantly improved over the last decade. Currently, face verification is widely deployed in airports, hotels, banks, and smartphones to unlock applications. Although human daily life can benefit from automatic face verification, those unattended face verification applications are also potentially exposed to attacks.
Since algorithms and internal structures inside a face verification system are generally unknown, an adversary can hardly invade a system in an intrusive way. However, external cameras used for capturing probe face images are directly exposed to all users, and they can be maliciously attacked by an adversary in a non-intrusive way. For a legitimate enrolled subject, a bona fide presentation is directly used for identity verification, while an adversary may present a presentation attack instrument (PAI) to interfere with the system policy, such as deliberately hiding the facial area with accessories or wearing a mask. Among them, a face presentation attack (PA, also known as a spoofing attack or direct attack) is a typical non-intrusive attack, and it is launched by presenting a facial artifact to externally accessible sensors or cameras of a capture device [38]. As large amounts of face photos or videos are posted online, an adversary can easily acquire one’s facial biometric information to create an artifact [26]. According to different attack goals, face PAs can be further classified into concealer attacks (avoid a successful match) and impostor attacks (impersonate another identity). For impostor attacks, print attacks and replay attacks are two commonly used low-cost attack mechanisms, and the vulnerability of face verification systems to them was validated in [26]. However, the existing work on face presentation attack detection (PAD, also known as anti-spoofing, spoofing detection, or liveness detection) has already achieved reasonable performance in detecting such attacks [23], and the generalization ability of PAD methods has also been marginally improved [48]. Recently, custom silicone 3D masks were introduced in PA, and their threats were demonstrated in [39]. Nevertheless, the production of realistic masks is very expensive, and such masks still can be detected by using multi-spectral capture devices [14, 45].
To enhance the security and robustness of a face verification system, a PAD module has gradually become an indispensable part of it. Therefore, to launch a PA, an adversary needs to bypass both the PAD and the face verification module. To successfully bypass a face verification module, a facial artifact used for identity impersonation must be highly similar to a target victim’s face. For face PAD, the existing PAD methods mainly focus on the essential differences between bona fide presentations and attack ones in terms of structure, texture, image quality, motion pattern, and color distortion. Thus, to evade PAD, an adversary should simulate the above characteristics of a bona fide presentation as much as possible; namely, the differences between a facial artifact and a human face should be minimized.
Due to the rapid development of robust face verification, the robustness of algorithms has been significantly improved in the last decade. For instance, in occluded face verification, a nearly 100% recognition rate is achieved even with 80% occlusions [25]. Motivated by this, to maliciously compromise a face verification system at low cost, a simple but effective impostor PA based on facial components is proposed in this article. The main idea of our approach is to create a facial artifact using only the most vulnerable facial components rather than using the entire facial area. Meanwhile, to optimize the construction of the proposed component-based facial artifacts, the vulnerability of different facial components to impostor PAs is analyzed. Therefore, in the proposed PA, an attack presentation is composed of a facial artifact and an attacker’s own real face, and they are designed for identity impersonation and bypassing PAD, respectively. The main contributions of this work are summarized as follows.
(1) Based on the vulnerability analysis of different facial components to impostor presentation attacks, a novel face PA based on facial components is proposed. It can pass face presentation attack detection while impersonating victims’ identity.
(2) A database for presentation attacks based on facial components is built. It contains 63 subjects, four facial artifact areas (nose, midface, upper face, and central face), three smartphones (Honor 9, Huawei Mate 9 Pro, and iPhone X), and two presentation attack instruments (A4 paper and photo paper).
(3) An empirical study is performed with the proposed PA and the self-built database. The results indicate that the proposed attack outperforms the existing print, replay, and mask presentation attacks from the aspects of impersonation and evading presentation attack detection, and it has better generalization ability compared with the eyeglass-frame-based adversarial examples.
The rest of the article is organized as follows. Related work and background are introduced in Section 2. The proposed attack is illustrated in Section 3. Data collection is described in Section 4. Face PAD methods used for evaluations are described in Section 5. Experimental results and analysis are presented in Section 6. Finally, some conclusions are drawn in Section 7.

2 Related Work and Background

According to the standard ISO/IEC 30107-1 [17], attacks with face artifacts are presentation attacks (a.k.a non-intrusive attacks or outside attacks). Currently, face PAs [38] and face morphing attacks [12] are typical non-intrusive attacks. Face PAs target the capture subsystem, and they are launched in a verification (or identification) process. As this article focuses on non-intrusive impostor PAs, these attacks and their countermeasures are reviewed in the following.

2.1 Face Presentation Attacks

In face PA benchmark databases [8, 55], print attacks, replay attacks, and mask attacks were commonly used as adversaries (impersonation). However, since many efforts have been made on face PAD, the performance of countering such attacks has reached a high level [19, 23]. Although the detection performance of a cross-database test (trained on a database and tested on another) is still limited, the generalization ability of face PAD methods has been improved compared to the early research [48]. Besides the above attacks, cutting photo attack was proposed in CASIA FASD [55] and CASIA-SURF databases [54]. In this attack, the eyes area in a facial artifact is cut out, and then it is replaced by human eyes to simulate eye blinking. However, excellent detection results have been achieved for this attack with the existing PAD schemes [28]. Although cropped mask and partial paper attacks were introduced in the newly collected ROSE-Youtu [24] and SiW-M databases [30], the vulnerability of face PAD methods to those attacks is still limited. Recently, a database for age-induced makeup-based concealer attacks was built in [22], and the testing results show that makeup attacks are difficult to detect.
In addition to the attacks in face PA benchmark databases, Xu et al. proposed an attack based on 3D face model reconstruction and virtual reality [50]. It first builds a 3D face model from a human face and then synchronously simulates facial motions and expressions of a human face via virtual reality. This attack can effectively evade challenge response-based PAD methods, but the facial artifact is still presented in a digital display, which can be prevented by the use of structure and depth information. On the basis of the vulnerability of deep learning models to adversarial examples, some physical adversarial examples were successively created using eyeglass frames [41, 42], infrared light [57], and digital screens [53] to spoof deep-learning-based face verification and PAD methods. However, these methods are not suitable for black-box attack scenarios, where model architectures and parameters are unknown to adversaries, because they are required for generating these adversarial examples. Chen et al. proposed to bypass face verification using malicious makeup; it is highly inconspicuous since a human face is directly used for launching attacks [6]. Nevertheless, it needs an accomplice who has much knowledge about makeup. Shen et al. designed a visible light-based physical adversarial example to spoof face recognition [43]. Agarwal et al. proposed to deceive the existing PAD algorithms via image transforms [1], but their evaluations are not made with commercial systems. After that, Nguyen et al. further investigated light projection attacks and conducted a feasibility study with two open-source and one commercial face recognition systems [32]. Recently, a partial face tampering attack was proposed to mislead face verification, but it only targets the digital domain [31].

2.2 Face Presentation Attack Detection

Based on different presentation attack detection clues, the existing face PAD methods can be classified into texture-based methods, image-quality-based methods, structure-based methods, color-distortion-based methods, motion-based methods, and liveness-based methods. A taxonomy of face PAD clues is shown in Figure 1.
Fig. 1.
Fig. 1. A taxonomy of face presentation attack detection (PAD) clues.
Texture-based methods focus on the detection of textural and appearance discrepancies between human skin and PAIs like photos, digital screens, and masks [34, 35]. Image-quality-based methods focus on the detection of quality distortion, which is caused by recapturing faces presented in photos and digital displays [13]. The clues exploited for structure-based methods are the structural differences between 3D human faces and 2D PAIs (e.g., paper, screen) with a plain surface [20, 47]. Color-distortion-based methods judge liveness from the perspective of color distribution and chromatic diversity [3, 36]. The detection principle of motion-based methods is rigid and non-rigid motion distinctions between attack and bona fide presentations [56]. Liveness-based methods identify attack presentations from bona fide ones through challenge response [27] or physiological signals [29]. Apart from handcrafted features, deep-learning-based methods adaptively extract features via convolutional neural networks. Nevertheless, their model architectures and loss functions still depend on the above-mentioned clues, such as depth information [37], lattice and reflection artifacts [52], albedo and shape properties [9], color degradation [46], and material properties of PAIs [51], because these clues are essential for differentiating between an attack presentation and a bona fide presentation.

2.3 Threat Model

It is assumed that an attacker aims to bypass an unattended automatic face verification system. Namely, the attacker tries to illegally get access to the system by maliciously impersonating an authorized user. Meanwhile, the facial similarity between the attacker and the target victim is limited, which means the attacker cannot directly match with the enrolled biometric reference of the target victim. The system is only equipped with an RGB camera, while both face verification and PAD modules are installed, and they are used for identity authentication and preventing attack presentations, respectively. To launch an impostor PA, the attacker needs to bypass both the face verification and PAD modules. It also should be mentioned that this work focuses on a black-box attack scenario, where the inside of the system (e.g., internal configurations, training data, deep learning model architectures and parameters, feature extraction, and classification algorithms) is totally unknown to the attacker. Although PAD results are reported in the evaluations, these results are not used for constructing facial artifacts in this work. What the attacker knows is the verification result of match or non-match outputted by the system. The inside of the system is assumed to be secure and integrated, and any manipulation would be spotted. It means that the system is immune to any intrusive attacks. Thus, only non-intrusive attacks are of relevance here. Furthermore, the attacker owns only one face image of the target victim, which can be downloaded from social media or captured by a camera from a long distance. However, the attacker does not have any accomplices for launching a morphing attack and only has limited amounts of financial resources that are not sufficient to create a realistic face mask, do plastic surgery, or hire a makeup artist for malicious makeup. The workflow of a face verification system is shown in Figure 2.
Fig. 2.
Fig. 2. The workflow of a face verification system.

3 Face Presentation Attacks Based On Facial Components

The notations in Section 3 are summarized in Table 1.
Table 1.
NotationsDescriptions
\(F_{PAD}\) presentation attack detection module of the system
\(th_{PAD}\) threshold of presentation attack detection
\(F_{PAD}(\cdot)\) output score of \(F_{PAD}\) , \(F_{PAD}(\cdot) \gt th_{PAD}\) means an attack presentation
\(F_{VER}\) face verification module of the system
\(th_{VER}\) threshold of face verification
\(F_{VER}(\cdot , \cdot)\) output score of \(F_{VER}\) , \(F_{VER}(\cdot , \cdot) \gt th_{VER}\) means the input face images are matched
\(I_{ATT}\) image of attacker’s bona fide facial representation
\(I_{REF}\) target victim’s reference face image enrolled in the system
\(I_{VIC}\) facial artifact image of target victim’s facial representation
\(G\) facial artifact generator
\(G(I_{ATT}, I_{VIC})\) facial components-based artifact
\(M_{C}\) binary mask operator of the components (from \(I_{VIC}\) ) in \(G(I_{ATT}, I_{VIC})\)
\(A_{C}\) area of the components (from \(I_{VIC}\) ) in \(G(I_{ATT}, I_{VIC})\)
\(D_{M}\) mask density of \(M_{C}\)
\(hmc\) height of \(M_{C}\)
\(wmc\) width of \(M_{C}\)
\(D_{l}\) lower bound of \(D_{M}\) for identity impersonation
\(D_{u}\) upper bound of \(D_{M}\) for evading presentation attack detection
\(m_{o}\) optimal choice of \(M_{C}\)
\(d_{m}\) mask density of \(m_{o}\)
\(C\) component parameter for generating \(G(I_{ATT}, I_{VIC})\)
\(S\) scale parameter for generating \(G(I_{ATT}, I_{VIC})\)
\(P\) \(P\) = ( \(px\) , \(py\) ), position parameter for generating \(G(I_{ATT}, I_{VIC})\)
\(wm\) width of \(I_{ATT}\)
\(hm\) height of \(I_{ATT}\)
\(wa\) width of \(I_{VIC}\)
\(ha\) height of \(I_{VIC}\)
Table 1. Summary of the Notations in Section 3

3.1 Motivation

For an attacker who aims to launch an impostor PA against a face verification system, he or she will present his or her own facial characteristic directly to the capture device and then the corresponding face image \(I_{ATT}\) is captured by the system. After that, it is fed into the PAD module of the system and subsequently to the face verification module of the system. In a bona fide presentation situation of the attacker interacting with the capture device, we have
\begin{equation} F_{PAD}(I_{ATT}) \lt th_{PAD}, \end{equation}
(1)
\begin{equation} F_{VER}(I_{ATT}, I_{REF}) \le th_{VER}, \end{equation}
(2)
where \(I_{REF}\) is a reference face image enrolled by a target victim to the system in advance; \(F_{PAD}\) and \(F_{VER}\) are face PAD and face verification modules of the system, respectively; and \(th_{PAD}\) and \(th_{VER}\) are thresholds for face PAD and face verification, respectively. The face PAD module \(F_{PAD}\) takes a probe facial image as input. If the output score is lower than \(th_{PAD}\) , the input facial image is a bona fide presentation; otherwise, it is an attack presentation. The face verification module \(F_{VER}\) takes a probe face image (e.g., \(I_{ATT}\) ) and a reference face image from the gallery (e.g., \(I_{REF}\) ) as input. If the computed comparison score is higher than \(th_{VER}\) , it means that both facial representations match (e.g., \(I_{ATT}\) and \(I_{REF}\) are from the same subject); otherwise, the system will determine that they do not match (e.g., \(I_{ATT}\) and \(I_{REF}\) are not from the same subject). In that case, the attacker’s own face (bona fide) may likely be accepted by the PAD module, but it will still be stopped by the face verification module unless the facial similarity between the attacker and the victim is high, which would lead to an unsuccessful attack.
If the attacker presents a facial artifact of the target victim to the capture subsystem, we obtain
\begin{equation} F_{PAD}(I_{VIC}) \ge th_{PAD}, \end{equation}
(3)
\begin{equation} F_{VER}(I_{VIC}, I_{REF}) \gt th_{VER}, \end{equation}
(4)
where \(I_{VIC}\) is a facial artifact image captured by the system. In that case, the facial artifact can spoof the verification module, but it still can be defended by the PAD module, which also leads to a failed attack.
From the above analysis, to bypass both PAD module \(F_{PAD}\) and face verification module \(F_{VER}\) , an intuitive solution is to effectively combine the attacker’s real face \(I_{ATT}\) and the facial artifact of the victim \(I_{VIC}\) . Here, the face verification module \(F_{VER}\) is implemented by face comparison. It takes a probe face image (e.g., \(I_{ATT}\) ) and a reference face image from the gallery (e.g., \(I_{REF}\) ) as the input and computes a comparison score \(F_{VER}(\cdot , \cdot)\) for measuring the facial similarity between the two input face images. A higher value of the comparison score \(F_{VER}(\cdot , \cdot)\) indicates the higher facial similarity between the two input face images. If the computed comparison score \(F_{VER}(\cdot , \cdot)\) is higher than the face verification threshold \(th_{VER}\) , it means that both facial representations are matched (e.g., \(I_{ATT}\) and \(I_{REF}\) are from the same subject); otherwise, the system will determine that they are not matched (e.g., \(I_{ATT}\) and \(I_{REF}\) are not from the same subject). Thus, for a successful attack of impersonating the target victim, \(F_{VER}(\cdot , \cdot) \gt th_{VER}\) is required. It is worth noting that some face verification systems may use a distance score or a dissimilarity score, where a higher value of the score indicates lower facial similarity. In that case, \(F_{VER}(\cdot , \cdot) \lt th_{VER}\) is necessary for a successful attack of impersonating the target victim, but this is not the case used in this article. Then, it can obtain
\begin{equation} \begin{split}& \text{minimize} \ \left\Vert G(I_{ATT}, I_{VIC}) - I_{ATT} \right\Vert \\ & \text{subject to} \ F_{PAD}\left(G(I_{ATT}, I_{VIC}) \right) \lt th_{PAD} \ \text{and} \ F_{VER}\left(G(I_{ATT}, I_{VIC}), I_{REF} \right) \gt th_{VER},\end{split} \end{equation}
(5)
where \(G\) is a facial artifact generator. It takes the attacker’s real face \(I_{ATT}\) and the facial artifact of the victim \(I_{VIC}\) as input and generates a facial artifact combining both of them. However, as PAD module \(F_{PAD}\) and face verification module \(F_{VER}\) are unknown to the attacker, it is difficult to generate a facial artifact \(G(I_{ATT}, I_{VIC})\) via a white-box learning-based method, which needs the architecture and the parameters of a target model. Although there are black-box adversarial attacks against machine learning systems such as HopSkipJumpAttack [7], it is still difficult to directly apply HopSkipJumpAttack to this black-box setting. The applicability of HopSkipJumpAttack to this black-box setting is analyzed as follows.
(1) The emphasis of HopSkipJumpAttack is to attack an image classifier, which takes an image as the input and predicts a class label belonging to the input image, while the face verification module takes a probe image and a reference image as the input and outputs the comparison score of two input images. Since the number of the input image of the target model of HopSkipJumpAttack (one image) is different from that of the input images of the face verification module (two images), HopSkipJumpAttack cannot be directly applied to attack the face verification module.
(2) For HopSkipJumpAttack, the function of the target model is image classification, while the functions of the face verification module are facial representation extraction and comparison. Although image classification has a similar step (image feature extraction) of facial representation extraction, it does not include the step of facial representation comparison. Since the function of the target model of HopSkipJumpAttack is different from that of the face verification module, HopSkipJumpAttack cannot be directly applied to attack the face verification module.
(3) When attacking a real-world face verification system, an attacker is only allowed to try a limited number of times (e.g., 10 times). Since HopSkipJumpAttack requires at least 1,000 model queries, the attacker has to implement it using the output of a substitute face verification system, such as a pubic deep learning network. In this scenario, it is important that HopSkipJumpAttack possesses good generalization ability to unknown systems. However, as HopSkipJumpAttack does not focus on this scenario, this aspect was not investigated in [7].
Inspired by occluded face recognition, an empirical study approach is proposed in this article to solve Equation (5). The core concept behind our approach is to change the construction of facial artifacts from the holistic face level to the component level, where an artifact \(G(I_{ATT}, I_{VIC})\) is constructed with the most vulnerable facial components from the target victim’s representation \(I_{VIC}\) . In occluded face recognition, even if a part of a probe face is occluded, it still can correctly match with a reference image from the gallery. It indicates that partial face is sufficient for face recognition [25]. On the other hand, in an attacker’s opinion, it is reasonable to assume that partial face is enough for launching impostor PA. To find the most vulnerable facial components for generating the facial artifact \(G(I_{ATT}, I_{VIC})\) , the attacker’s real face \(I_{ATT}\) (bona fide) is used as an initial state, and the components from the victim’s representation \(I_{VIC}\) are incrementally combined with \(I_{ATT}\) to create the PAI. For simplicity, it is assumed that the components of \(I_{VIC}\) are linearly added into \(I_{ATT}\) , and we have
\begin{equation} G(I_{ATT}, I_{VIC}) = I_{ATT} + M_{C} \cdot I_{VIC}, \end{equation}
(6)
where \(M_{C}\) is a binary mask operator of the components, and it is defined as
\begin{equation} M_{C}(x, y) = \left\lbrace \begin{aligned}1 & , & (x, y) \in A_{C}, \\ 0 & , & (x, y) \notin A_{C},\end{aligned} \right. \end{equation}
(7)
where \(A_{C}\) is the corresponding area of the components. To quantify the contribution of the artifact \(I_{VIC}\) in \(G(I_{ATT}, I_{VIC})\) , the mask density \(D_{M} \in [0, 1]\) of \(M_{C}\) is defined as
\begin{equation} D_{M} = \left(\sum _{i=1}^{hmc} \sum _{j=1}^{wmc} M_{C}(i,j) \right)/ \left(hmc \times wmc \right)\!, \end{equation}
(8)
where \(hmc\) and \(wmc\) are the height and width of \(M_{C}\) , respectively. A higher value of \(D_{M}\) means that a facial artifact \(G(I_{ATT}, I_{VIC})\) is more dominated by the victim’s artifact \(I_{VIC}\) . In other words, with the increase of \(D_{M}\) , \(G(I_{ATT}, I_{VIC})\) is more similar to a victim’s face, but it has fewer characteristics of a bona fide presentation.
When the components of \(I_{VIC}\) are incrementally added to the attacker’s real face \(I_{ATT}\) , there exists a lower bound \(D_{l}\) of the mask density \(D_{M}\) according to Equation (2) and Equation (4), which enables the generated artifact in Equation (6) to impersonate the victim’s identity, and we obtain
\begin{equation} \begin{split}\text{Bool}\left(F_{VER}(I_{ATT} + M_{C} \cdot I_{VIC}, I_{REF}) \gt th_{VER} \right) = \left\lbrace \begin{aligned}&\text{True}, & D_{M} \ge D_{l}, \\ &\text{False}, & D_{M} \lt D_{l}.\end{aligned} \right.\end{split} \end{equation}
(9)
Similarly, from Equations (1) and (3), there exists an upper bound \(D_{u}\) of the mask density \(D_{M}\) , which enables the generated artifact in Equation (6) to bypass PAD, and we get
\begin{equation} \begin{split}\text{Bool}\left(F_{PAD}(I_{ATT} + M_{C} \cdot I_{VIC}) \lt th_{PAD} \right) = \left\lbrace \begin{aligned}& \text{True}, & D_{M} \le D_{u}, \\ & \text{False}, & D_{M} \gt D_{u}.\end{aligned} \right.\end{split} \end{equation}
(10)
According to the existing occluded face recognition [25], around a 97% recognition rate can be achieved with only 10% visible face area. Thus, it is reasonable to assume that the lower bound \(D_{l}\) for identity impersonation is a small value (e.g., 0.1). Under this condition, if \(D_{u} \lt D_{l}\) , a bona fide presentation with some normal accessories (e.g., eyeglasses, long beard, cosmetic contact lenses, eye blacks, fake eyelashes) may be misclassified as an attack. Therefore, \(D_{l} \le D_{u}\) is more desirable. In that case, an optimal component mask \(m_{o}\) with a density between \(D_{l}\) and \(D_{u}\) can be found for generating a facial artifact \(G(I_{ATT}, I_{VIC})\) , and we have
\begin{equation} \begin{split}\left\lbrace \begin{aligned}& F_{PAD}(I_{ATT} + m_{o} \cdot I_{VIC}) \lt th_{PAD}, \\ & F_{VER}(I_{ATT} + m_{o} \cdot I_{VIC}, I_{REF}) \gt th_{VER}, \\ \end{aligned} \right. \quad D_{l} \le d_{m} \le D_{u},\end{split} \end{equation}
(11)
where \(d_{m}\) is the mask density of \(m_{o}\) .
In the generated facial artifact \(G(I_{ATT}, I_{VIC})\) , the components \(m_{o} \cdot I_{VIC}\) ( \(d_{m} \ge D_{l}\) ) represent a principal component of the facial biometric characteristics of the target victim, and it is devised for spoofing face verification module \(F_{VER}\) . Meanwhile, as the mask density satisfies \(d_{m} \le D_{u}\) , a face area presented to a system is still dominated by \(I_{ATT}\) (the attacker’s bona fide face representation), and hence PAD module \(F_{PAD}\) will not be triggered. As illustrated in Table 2, compared to the existing artifacts (e.g., photos, replayed videos, masks), the components-based PAI is more indistinguishable from a bona fide presentation in terms of structure, texture, image quality, motion, color distortion, and liveness characteristics. In addition, it can be easily implemented at low cost.
Table 2.
PAsPrintReplay3D MaskProposedBona Fide
PAIspapersdigital displaysmaskspapers-
Costlowlowhigh \(^{\tiny {\href {1}{1}}}\) low-
Livenessnopartialpartialyesyes
Challenge responsenonoyesyesyes
Image qualityrecapturedrecapturedsingle capturedmostly single capturedsingle captured
Structure2D2D3Dmostly 3D3D
Motionstaticpartialno subtle motionsyesyes
Color distortionCMYKRGBretouchingmostly skin colorskin color
Appearanceartifactsartifactsartifactsmostly skinskin
Table 2. Comparison of Different Face Presentation Attacks (PAs)
1Approximately USD 4,000 per mask in [39].

3.2 The Proposed Face Presentation Attacks

In this article, a facial artifact \(G(I_{ATT}, I_{VIC})\) is generated, where an attacker’s bona fide face representation \(I_{ATT}\) is used as initial condition, and it is combined with a victim’s facial representation \(I_{VIC}\) by using different component, position, scale, shape, and intensity parameters. The parameters are determined by minimizing the distance between \(G(I_{ATT}, I_{VIC})\) and \(I_{ATT}\) , so that the generated facial artifact \(G(I_{ATT}, I_{VIC})\) not only bypasses the face verification module but also deceives the PAD module. As a result, the generated facial artifact \(G(I_{ATT}, I_{VIC})\) can effectively take advantage of the attacker’s own face \(I_{ATT}\) (bona fide) and the victim’s facial artifact \(I_{VIC}\) (facial representation of the victim). Therefore, it can serve for both impersonation and bypassing the PAD. After generating a digital facial artifact, a physical PAI of the generated artifact is printed with a size of a human face, and then the component that is from the victim’s facial artifact is cut out. Finally, the cropped facial-components-based artifact is positioned onto the corresponding area of the attacker’s real face, and the attacker can present to the system for launching an impostor PA. The proposed facial-components-based PA is illustrated in Figure 3.
Fig. 3.
Fig. 3. The proposed facial-components-based presentation attack.
To find a solution to Equation (5), this article conducts digital simulation experiments to empirically optimize the generation of a facial-components-based artifact \(G(I_{ATT}, I_{VIC})\) . In the digital simulation, the generation of the facial-components-based artifact \(G(I_{ATT}, I_{VIC})\) is formulated by a facial component \(C\) , a scale parameter \(S\) , and a position parameter \(P\) . Thus, the minimization objective of Equation (5) can be solved by the optimization of these three variables ( \(C\) , \(S\) , \(P\) ). Here, these three variables ( \(C\) , \(S\) , \(P\) ) are used for minimizing the distance between the image of the facial-components-based artifact \(G(I_{ATT}, I_{VIC})\) and the image of the attacker’s bona fide facial representation \(I_{ATT}\) . The lower value of the distance between \(G(I_{ATT}, I_{VIC})\) and \(I_{ATT}\) represents the lower proportion of the facial artifact in the entire face area. This can minimize the characteristics of an attack presentation and can better evade presentation attack detection. The norm used in the minimization objective is \(L_{2}\) -norm.
To select an optimal combination of the facial component \(C\) , the scale parameter \(S\) , and the position parameter \(P\) , this article defines a search space for each of them and digitally generates the facial-components-based artifacts by using the components and the parameters defined in the search spaces. With the generated digital artifacts, the digital simulation experiments are carried out to evaluate the performance of them in identity impersonation. After that, the facial component \(C\) , the scale parameter \(S\) , and the position parameter \(P\) are empirically selected according to the digital simulation results (the performance of identity impersonation). The search spaces are defined as follows.
(1) Since eyes, nose, and mouth are basic facial components, they are chosen for constructing different components-based artifacts. Thus, the search space of the facial component \(C\) contains eyes, nose, mouth, forehead, combination of eyes and nose, upper face, midface, lower face, central face, and whole face (baseline)-based artifacts, which are denoted by \(C\) = {eyes, nose, mouth, forehead, eyes & nose, upper, mid, lower, central, whole}. The facial components and areas are determined by 68 dlib landmarks [21], and the indices of the landmarks are listed in Table 3.
Table 3.
LandmarksEyesNoseMouthForeheadUpperMidfaceLowerCentral
Left363148000336
Right4535541616161345
Top372851(19 + 24)/2–30 + 2727 \(\times\) 2–33285137
Bottom463357(19 + 24)/22851833
Table 3. Landmark (Dlib) Indices of Different Facial-Components-based Artifacts
(2) The search space of the scale parameter \(S\) is formulated by \(S_{i}\) = 1.0 + 0.2 \(i\) , \(i\) = 0, 1, 2, \(\dots\) , 7. The search of \(S\) is used for minimizing the proportion of the components-based artifact in the entire face area.
(3) The search space of the position parameter \(P\) is formulated by \(P\) = ( \(px\) , \(py\) ), where the value of \(P\) satisfies \(px, py \in \lbrace -10, 0, 10\rbrace\) , which are horizontal and vertical shifts, respectively. The original reference positions of these shifts are the original dlib landmarks listed in Table 3. The search of \(P\) is used for finding an optimal position of the components-based artifact onto the entire face area.
The above three parameters are optimized according to digital simulation results. For simplicity, the shapes of the artifacts are set as rectangles. Moreover, to preserve more properties of the facial biometric characteristics (e.g., skin color), the original intensities of the victim’s artifact are directly used.
For a given component \(C\) , a scale \(S\) , and a position \(P\) = ( \(px\) , \(py\) ), a components-based artifact \(G(I_{ATT}, I_{VIC})\) can be generated from an attacker’s face image \(I_{ATT}\) and a victim’s facial artifact \(I_{VIC}\) . The details of the generation are described as follows.
(1) For an attacker’s face \(I_{ATT}\) and a victim’s facial artifact \(I_{VIC}\) , the component \(C\) is first located to obtain its top left coordinates ( \(ml\) , \(mt\) ) and ( \(al\) , \(at\) ) as well as right bottom coordinates ( \(mr\) , \(mb\) ) and ( \(ar\) , \(ab\) ) in \(I_{ATT}\) and \(I_{VIC}\) , respectively.
(2) In the victim’s facial artifact \(I_{VIC}\) , the size of the component \(C\) is scaled with the scale \(S\) , and its location in \(I_{ATT}\) after scaling is
\begin{equation} asl = \text{max}\left(0, (al+ar)/2 - (ar-al)/2 \times S \right)\!, \end{equation}
(12)
\begin{equation} ast = \text{max}\left(0, (at+ab)/2 - (ab-at)/2 \times S \right)\!, \end{equation}
(13)
\begin{equation} asr = \text{min}\left((al+ar)/2 + (ar-al)/2 \times S, wa \right)\!, \end{equation}
(14)
\begin{equation} asb = \text{min}\left((at+ab)/2 + (ab-at)/2 \times S, ha \right)\!, \end{equation}
(15)
where ( \(asl\) , \(ast\) ) and ( \(asr\) , \(asb\) ) are the top left and the right bottom coordinates of \(C\) in \(I_{VIC}\) after scaling, and \(wa\) and \(ha\) are the width and the height of \(I_{VIC}\) , respectively. The victim’s facial artifact \(I_{VIC}\) is cropped with ( \(asl\) , \(ast\) ) and ( \(asr\) , \(asb\) ), and it can obtain
\begin{equation} C_{A} = I_{VIC}(ast:asb, asl:asr), \end{equation}
(16)
where \(C_{A}\) is cropped from the victim’s artifact and it only contains the component \(C\) .
(3) In the attacker’s face \(I_{ATT}\) , the component \(C\) is scaled with the scale \(S\) and shifted by the position \(P\) = ( \(px\) , \(py\) ), respectively, and its location in \(I_{ATT}\) after scaling and shifting is
\begin{equation} msl = \text{max}\left(0, (ml+mr)/2 - (mr-ml)/2 \times S + px \right)\!, \end{equation}
(17)
\begin{equation} mst = \text{max}\left(0, (mt+mb)/2 - (mb-mt)/2 \times S + py \right)\!, \end{equation}
(18)
\begin{equation} msr = \text{min}\left((ml+mr)/2 + (mr-ml)/2 \times S + px, wm \right)\!, \end{equation}
(19)
\begin{equation} msb = \text{min}\left((mt+mb)/2 + (mb-mt)/2 \times S + py, hm \right)\!, \end{equation}
(20)
where ( \(msl\) , \(mst\) ) and ( \(msr\) , \(msb\) ) are the top left and the right bottom coordinates of \(C\) in \(I_{ATT}\) after scaling and shifting, and \(wm\) and \(hm\) are the width and the height of \(I_{ATT}\) , respectively.
(4) The size of the artifact \(C_{A}\) is normalized to ( \(msr\) - \(msl\) ) \(\times\) ( \(msb\) - \(mst\) ) pixels for aligning its size with that of the attacker’s face \(I_{ATT}\) .
(5) In the attacker’s face \(I_{ATT}\) , the corresponding area of component \(C\) is replaced by the artifact \(C_{A}\) to generate the components-based artifact \(G(I_{ATT}, I_{VIC})\) in the digital domain. For given pixel indices ( \(ax\) , \(ay\) ) in \(G(I_{ATT}, I_{VIC})\) , if \(msl \lt ax \lt msr\) and \(mst \lt ay \lt msb\) , it is calculated by
\begin{equation} G(I_{ATT}, I_{VIC})(ax, ay) = C_{A}(ax-msl,ay-mst); \end{equation}
(21)
otherwise, it is calculated by
\begin{equation} G(I_{ATT}, I_{VIC})(ax, ay) = I_{ATT}(ax,ay), \end{equation}
(22)
where \(ax\) and \(ay\) are the row and the column pixel indices in \(G(I_{ATT}, I_{VIC})\) , respectively.
(6) Finally, \(G(I_{ATT}, I_{VIC})\) is printed with a size of a human face. After that, the region of the component \(C\) is cut out and it is pasted onto the corresponding area of the attacker’s real face. To launch an impostor PA, the attacker presents the components-based artifact to a face verification system. At the same time, the attacker’s digital face \(I_{ATT}\) in \(G(I_{ATT}, I_{VIC})\) is physically replaced by the attacker’s own face.
The steps for generating the proposed attack are provided in Algorithm 1.
The practical use cases and application scenarios of facial-components-based presentation attacks are elaborated as follows.
(1) The standard ISO/IEC 30107-1 [17] points out that there are unattended biometric recognition applications equipped with automated presentation attack detection methods. “In unattended applications, such as remote authentication over open networks, automated presentation attack detection methods could be applied to mitigate the risks of attack.” The proposed attack can be launched to threaten the security of these unattended applications.
(2) In many real-world automated gates at apartments, companies, school buildings, and bus and train stations, face verification systems are deployed for verifying biometric claims. To save human resources, some of these systems operate with an unsupervised manner during the verification process.1 When no biometric attendant is around these systems, an attacker may use the proposed attack to gain an unauthorized access.
(3) Nowadays, many mobile devices, such as cell phones and tablets, are equipped with face unlocking systems. If these mobile devices are lost or stolen (it is reported that “over 70 million cell phones are lost each year2), the face unlocking systems might be compromised by the proposed attack.

4 Database

4.1 Digital Facial Artifacts

In this work, some frontal face images from 63 Chinese subjects (36 males, 27 females) are used for generating digital facial artifacts. Most of the participants are university students, and their age ranges from 18 to 26. These face images are captured by a Canon EOS 600D camera, and they are normalized to a size of 256 \(\times\) 384 pixels. Specifically, this resolution is only used for digital simulation, while a resolution of 1080 \(\times\) 1920 pixels is used for physical implementation experiments. During the generation of digital facial artifacts, for a given facial component \(C\) , a scale \(S\) , and a position \(P\) = ( \(px\) , \(py\) ), one subject acts as an attacker in turn, while the remaining 62 subjects are considered as target victims. As a result, with each set of the above parameters, 63 \(\times\) 62 = 3,906 digital facial artifacts are generated. The examples of the digital facial artifacts with different facial components, scales, and positions are shown in Figure 4 and Figure 5, respectively. In most cases, the proportion of the attacker in the entire face area is higher than 70%, and it indicates that the features of bona fide presentations are effectively presented.
Fig. 4.
Fig. 4. Digital artifacts with different facial components (the percentages are the proportions of artifacts in whole face areas). (a) Eyes. (b) Nose. (c) Mouth. (d) Forehead. (e) Eyes and nose. (f) Upper face. (g) Midface. (h) Lower face. (i) Central face. (j) Whole face (baseline).
Fig. 5.
Fig. 5. Digital nose-based artifacts with different scales \(S\) and positions \(P\) = ( \(px\) , \(py\) ) (the percentages are the proportions of artifacts in whole face areas). (a) \(S\) = 1.2. (b) \(S\) = 1.4. (c) \(S\) = 1.6. (d) \(S\) = 2.0. (e) \(S\) = 2.4. (f) \(P\) = (0, 10). (g) \(P\) = (0, \(-\) 10). (h) \(P\) = (10, 0). (i) \(P\) = ( \(-\) 10, 0). (j) \(P\) = (10, 10).

4.2 Physical Facial Artifacts

After generating digital artifacts, components-based artifacts are physically created through printing, cutting, and pasting. First, a digital artifact is printed with a size of a human face. After that, the area of the component is cut out, and then the component is attached onto the corresponding area of an attacker’s face. For the creation of the physical PAIs, the face images of the victims are the same as those in digital artifact generation. Only one male subject acts as an attacker, while the rest of the 62 subjects are considered as target victims. For each target victim, four components-based artifacts are physically created, and they are nose-, midface-, upper-face-, and central-face-based artifacts. Among them, the scales of nose-based artifacts are set as \(S\) = 1.4; the scales of midface-, upper-face-, and central-face-based artifacts are set as \(S\) = 1.2; and the positions of all artifacts are set as \(P\) = (0, 0).
To print components-based artifacts, normal A4 papers and Epson glossy photo papers are used as PAIs, and they are printed by Richo MP C6003 (4800 \(\times\) 1200 dpi) and Epson XP-860 printers, respectively. In the data collection, 1080P videos (1080 \(\times\) 1920 pixels) with 5 seconds are captured by the frontal cameras of Honor 9, Huawei Mate 9 Pro, and iPhone X smartphones, which are used for simulating impostor PAs in the smartphone face unlocking process. All data are collected in an indoor environment with outdoor lighting, and the distance between the smartphone and the human face is about 15 to 20 cm. The artifacts printed by A4 papers are captured by all three smartphones, while the artifacts printed by photo papers are only captured by Huawei Mate 9 Pro. As a result, (4 \(\times\) 3 + 4) \(\times\) 62 = 992 videos of the physical impostor PAs are collected, and some examples are shown in Figure 6.
Fig. 6.
Fig. 6. Physical facial artifacts collected by iPhone X. (a) Nose. (b) Central face. (c) Midface. (d) Upper face.
A comparison between the collected database and the databases used in the existing impostor presentation attacks is listed in Table 4. According to Table 4, the collected database is the largest one compared with the existing research.
Table 4.
Presentation Attacks (PAs)Number of Subjects \(^{\tiny {\href {1}{1}}}\) #Attack Presentations \(^{\tiny {\href {1}{1}}}\) Capture Devices
accessorize to a crime [41] \(^{\tiny {\href {2}{2}}}\) 3 attackers, 5 victims180–300 imagesCanon T4i
AGN [42] \(^{\tiny {\href {2}{2}}}\) 3 attackers, 9 victims12 videosCanon T4i
infrared light [57] \(^{\tiny {\href {2}{2}}}\) 1 attacker, 4 victims4 imagessurveillance cameras
replay attacks [53] \(^{\tiny {\href {2}{2}}}\) 20 victims800 imagesSamsung Note 9
visible light [43] \(^{\tiny {\href {2}{2}}}\) 9 subjects720 images-
light projections [32] \(^{\tiny {\href {2}{2}}}\) 50 subjects90 attemptsa web camera
proposed \(^{\tiny {\href {2}{2}}}\) 1 attacker, 62 victims992 videos3 smartphones
Table 4. Comparison of Different Databases
\(^{1}\) Only impostor presentation attacks (in the physical domain) are considered in counting the number of data subjects and attack presentations.
\(^{2}\) The presentation attack instruments (PAIs) of [41, 42] are eyeglass frames with glossy papers. The PAI of [57] is a cap with infrared LEDs. The PAI of [53] is an Acer KA240HQ digital screen. The PAIs of [32, 43] are visible light patterns. The PAIs used in this article are A4 and photo papers.

5 Presentation Attack Detection Methodology

In the experiments, five PAD methods are used for evaluating the effectiveness of the proposed attack, and they are briefly described as follows.
(1) Texture [35]. This method focuses on texture clues. Guided scale local binary pattern and uniform local binary pattern are extracted as features, and linear support vector machine (SVM) is used for classification.
(2) Quality [13]. This method focuses on quality clues. Fourteen full-reference image quality measures based on pixel difference, correlation, and edge are used for feature extraction, and linear discriminant analysis is used for classification.
(3) Structure [20]. This method focuses on structure clues. Local speed patterns are utilized to extract local patterns of the diffusion speed as features, and linear SVM is used for classification.
(4) Color [3]. This method focuses on color distortion clues. Co-occurrence of adjacent local binary patterns and local phase quantization are extracted from HSV and YCbCr color spaces as features, and linear SVM is used for classification.
(5) Motion [56]. This method focuses on motion clues. Volume local binary count patterns are proposed to extract motion information of facial videos, and SVM is used for classification.
Although challenge-response-based PAD methods [27, 47] achieve good detection performance, they are not used for evaluation because their training data are not publicly available. Furthermore, these two methods cannot be trained with the public PAD databases, because they require specialized training data like device motion information or light projection information.

6 Experimental Results and Analysis

6.1 Experimental Setup

To demonstrate the effectiveness of the proposed PA against face verification, evaluations are made with two commercial face verification systems, Megvii Face++3 and Microsoft Azure Face,4 and a deep learning model, VGG-Face [33]. According to the recommendation for border control scenarios from FRONTEX [40], the thresholds of Face++ and Azure are set with a false acceptance rate (FAR) of 0.1% and as security setting, respectively. As for VGG-Face, 4096-dimensional features are directly extracted by the pre-trained model by removing the last classification layer, and face verification is accomplished by comparing the Euclidean distance of the extracted feature vectors. In our digital simulation, the component \(C\) , scale \(S\) , and position \(P\) = ( \(px\) , \(py\) ) are optimized based on the verification results from Face++ and VGG-Face. Azure is used as an unknown system for evaluating the generalization ability of the proposed attack.
For the analysis of the detection accuracy of the face PAD module, five methods are used, and they are based on clues of texture [35], image quality [13], structure [20], color distortion [3], and motion [56], respectively. As there are significant variations of image qualities, PAs, and PAIs in CASIA FASD [55], it is used for training the PAD methods in this article. Moreover, the previous research has demonstrated that PAD methods trained on CASIA FASD can achieve relatively good cross-database test results [3, 35]. Following the cross-database test protocol [10], the training set and the testing set of CASIA FASD are respectively used for training detection models and tuning thresholds. The thresholds are set at equal error rate points, where the attack presentation classification error rate (APCER) is equal to the bona fide presentation classification error rate (BPCER). Experiments are also made with the thresholds set at a fixed APCER according to the biometric PAD standard [18].
To reduce the effects of redundant background environments, face areas are first cropped in all physical experiments. For the test of face PAD methods and VGG-Face model, face detection is implemented by dlib [21], while for Face++ and Azure, face detection is implemented by itself.
Meanwhile, to simulate eye blinking of a bona fide presentation, eye areas of all upper- and central-face-based artifacts are cut out. Furthermore, motions (e.g., head movement, mouth opening) are also simulated by an attacker, and the presentation attack instrument is bent/warped to avoid planar effects.
To quantitatively measure the vulnerability of a face verification module, the impostor attack presentation match rate (IAPMR) [18] is used as the metric according to the International Standard ISO/IEC 30107-3, and it is calculated by
\begin{equation} \text{IAPMR} = IFA / (IFA + ITR), \end{equation}
(23)
where \(IFA\) and \(ITR\) represent the numbers of impostor attacks in which the target victim is matched and not matched, respectively. The higher the IAPMR is, the more vulnerable the face verification module is.
To measure the vulnerability of a face PAD module, the APCER [18] is used as the metric, and it is calculated by
\begin{equation} \text{APCER} = SFA / (SFA + STR), \end{equation}
(24)
where \(SFA\) and \(STR\) represent the numbers of impostor attacks in which the presentations are classified as bona fide and attack presentations, respectively. The higher the APCER is, the more vulnerable the face PAD module is. Since the collected database only contains attack presentations and does not contain bona fide presentations, BPCER is not reported in this article.

6.2 Experimental Results

In the experiments, digital simulation is first conducted to investigate the vulnerability of different facial components to impostor PAs, and then it optimally determines the parameters for facial artifact construction. After that, with these parameters, evaluations are performed with physical artifacts to demonstrate the effectiveness of the proposed PA against face verification and PAD systems.

6.2.1 Digital Simulation Results.

To optimize the position parameter \(P\) = ( \(px\) , \(py\) ) of facial artifacts, experiments are done to the upper-face-based digital artifacts with different positions, and the results from Face++ and VGG-Face are shown in Figure 7. As Azure is used as an unknown face verification system for evaluations, it is not used in the digital simulation. From the results, the highest IAPMR results can be obtained with the original position \(P\) = (0, 0). When a facial artifact is horizontally or vertically shifted, the component of the artifact is mixed with that of an attacker, which causes distortion in the global facial structure and leads to a decrease in IAPMR. Therefore, the position \(P\) = (0, 0) is chosen to create facial artifacts in the following experiments.
Fig. 7.
Fig. 7. IAPMR (%) of different positions \(P\) = ( \(px\) , \(py\) ).
To optimize component \(C\) and scale \(S\) of the facial artifacts, experiments are made to digital facial artifacts with different components and scales, and the results from Face++ and VGG-Face are listed in Table 5 and Table 6, respectively. Compared with eyes and mouth, higher IAPMR results are obtained from nose-based artifacts. It indicates that the nose is more vulnerable to impostor PAs than other facial components. Meanwhile, upper-face-, midface-, and central-face-based artifacts can achieve higher IAPMR results compared with lower-face-based artifacts. Moreover, with the increase of scale \(S\) , a face area is more governed by the facial biometric characteristics from the face representation of the victim, which leads to a higher IAPMR. Based on the above results, nose with scale \(S\) = 1.4 and upper face, midface, and central face with scale \(S\) = 1.2 are used for creating physical facial artifacts.
Table 5.
Scale \(S\) EyesNoseMouthForeheadEyesNoseUpperMidfaceLowerCentralWhole
1.03.386.811.641.0821.7660.4527.044.7957.22100.00
1.24.3811.422.281.4831.7778.2654.586.2084.84100.00
1.46.2226.143.152.0748.6487.2585.488.7898.00100.00
1.68.2447.594.693.8169.0792.9195.3713.08100.00100.00
1.810.7572.407.814.5686.9496.7297.8223.94100.00100.00
2.014.7094.0612.835.4597.2699.3399.4938.38100.00100.00
2.217.2398.8520.928.47- \(^{\tiny {\href {1}{1}}}\) 99.7799.9253.51100.00100.00
2.419.8799.9029.4416.85- \(^{\tiny {\href {1}{1}}}\) 99.8099.9753.51100.00100.00
Mean10.6057.1510.355.4759.2489.3182.4625.2792.51100.00
Table 5. IAPMR (%) of Different Components \(C\) and Scales \(S\) with Face++
\(^{1}\) Eyes and nose are highly overlapped when scale \(S \gt\) 2.0.
Table 6.
Scale \(S\) EyesNoseMouthForeheadEyesNoseUpperMidfaceLowerCentralWhole
1.00.050.000.000.130.1517.460.510.051.5671.38
1.20.050.080.000.460.5632.101.950.054.5890.07
1.40.230.310.001.052.4845.086.810.0519.9494.85
1.60.591.020.034.745.3355.2213.540.0548.5796.93
1.81.132.970.038.9110.4763.0821.450.2865.8097.82
2.01.847.860.1513.1117.3174.0132.511.1874.8698.31
2.22.7617.280.3117.97- \(^{\tiny {\href {1}{1}}}\) 84.1848.031.8484.8798.95
2.43.7129.600.7423.30- \(^{\tiny {\href {1}{1}}}\) 88.2066.693.0591.9499.31
Mean1.307.390.168.716.0557.4223.940.8249.0293.45
Table 6. IAPMR (%) of Different Components \(C\) and Scales \(S\) with VGG-Face
\(^{1}\) Eyes and nose are highly overlapped when scale \(S \gt\) 2.0.

6.2.2 Physical Implementation Results.

To demonstrate the vulnerability of face verification systems to the proposed PA, experiments are made with physical presentation attack instruments (facial artifacts), and the results from Face++ and VGG-Face are shown in Figure 8. As Azure is used as an unknown system for evaluations, its results are not listed here. From Figure 8, both Face++ and VGG-Face are vulnerable to the proposed attack. It indicates that an enrolled user can be maliciously impersonated by an artifact created by facial components. Meanwhile, by comparing the results of different components, the proportion of a nose in a whole face area is only around 15%, but its IAPMR results almost equal those of upper face, midface, and central face, which account for nearly 30% to 35% in a whole face area. This observation implies that more facial features are concentrated on the nose. Moreover, compared with the results of a single type of artifact, the impostor attack performance can be further improved with a compound attack, which is a combination of nose-, upper-face-, midface-, and central-face-based artifacts. For the VGG-Face model, evaluations are also performed with predefined thresholds (FAR = 0.1%), which are tuned by the labeled faces in the wild (LFW) [16] and disguised faces in the wild (DFW) [44] databases. The proposed PA can achieve IAPMR = 62.10% and 15.32% with the thresholds tuned by LFW and DFW databases, respectively.
Fig. 8.
Fig. 8. IAPMR (%) of physical facial artifacts.
To investigate the detection accuracy of the face PAD methods to the proposed PA, the performance of physical facial artifacts is evaluated, and the results are listed in Table 7. It can be found that the existing PAD methods [3, 13, 20, 35, 56] achieve high error rates in detecting the proposed attack. It reveals that the properties of a bona fide presentation can be simulated by a components-based facial artifact. Moreover, since the size of a nose-based artifact is smaller than that of other types of artifacts, more properties of a bona fide presentation emerge when the nose-based artifact is utilized. Thus, a high APCER can be obtained with the nose-based artifacts.
Table 7.
AttacksNoseCentralMidfaceUpperMean
Texture \(^{\tiny {\href {1}{1}}}\) 99.4682.2677.42100.0089.79
Quality \(^{\tiny {\href {1}{1}}}\) 84.9582.2695.7093.0188.98
Structure \(^{\tiny {\href {1}{1}}}\) 93.5592.4893.0199.4694.63
Color \(^{\tiny {\href {1}{1}}}\) 75.2741.4054.8427.9649.87
Motion \(^{\tiny {\href {1}{1}}}\) 64.5268.8274.7388.7174.20
Mean83.5573.4479.1481.8379.49
Table 7. APCER (%) of Physical Facial Artifacts
\(^{1}\) Texture [35], quality [13], structure [20], color [3], motion [56].

6.3 Performance Analysis

6.3.1 Performance Analysis of Different Smartphones.

To evaluate the impacts of different smartphones on the performance, experiments are done to the data collected by different smartphones, and the results are listed in Table 8. It can be found that both IAPMR and APCER results of the data collected with Honor 9 and Huawei Mate 9 Pro are similar to each other. In contrast, the results of the data collected by an iPhone X are different from those of the above smartphones to a certain extent. The reason is that Honor 9 and Huawei Mate 9 Pro are from the same corporation, and their frontal cameras also have the same resolution (8 megapixels). iPhone X is from another producer, and the resolution of its frontal camera is 7 megapixels.
Table 8.
PhonesIAPMR (%)APCER (%)
Face++VGG-FaceTexture \(^{\tiny {\href {1}{1}}}\) Quality \(^{\tiny {\href {1}{1}}}\) Structure \(^{\tiny {\href {1}{1}}}\) Color \(^{\tiny {\href {1}{1}}}\) Motion \(^{\tiny {\href {1}{1}}}\)
Honor53.6328.2381.0599.6093.5535.4876.61
Huawei40.7329.8493.1597.9898.7934.6858.06
iPhone35.8929.0395.1669.3591.5391.5387.90
Table 8. Performance Analysis of Different Smartphones
\(^{1}\) Texture [35], quality [13], structure [20], color [3], motion [56].

6.3.2 Performance Analysis of Different Presentation Attack Instruments.

To analyze the impacts of different PAIs, experiments are done to facial artifacts printed by A4 papers and photo papers, and the results are listed in Table 9. It can be found that higher IAPMR and APCER results are achieved with facial artifacts printed by A4 papers except for the results on the VGG-Face model. Compared with the coarse appearance of an A4 paper, the surface of a photo paper is extremely glossy, which is quite different from that of human skin. Thus, it leads to performance degradation.
Table 9.
PAIsIAPMR (%)APCER (%)
 Face++VGG-FaceTexture \(^{\tiny {\href {1}{1}}}\) Quality \(^{\tiny {\href {1}{1}}}\) Structure \(^{\tiny {\href {1}{1}}}\) Color \(^{\tiny {\href {1}{1}}}\) Motion \(^{\tiny {\href {1}{1}}}\)
A4 paper40.7329.8493.1597.9898.7934.6858.06
Photo paper32.2629.8491.1389.5297.9812.1048.79
Table 9. Performance Analysis of Different Presentation Attack Instruments (PAIs)
\(^{1}\) Texture [35], quality [13], structure [20], color [3], motion [56].

6.3.3 Performance Analysis of Different Target Victims.

To investigate the impostor attack performance of different categories of target victims, experiments are respectively conducted to male (mated gender to the attacker) and female target victims, and the results are shown in Figure 9. It can be found that the IAPMR results of male victims are relatively higher than those of female ones when the attacker is acted by a male person. Since there exist significant facial appearance differences between male and female, it is difficult for an attacker to impersonate a data subject with a different gender. In addition, by comparing the results of different facial components, the performance of the central-face-based artifacts is less affected by gender.
Fig. 9.
Fig. 9. IAPMR (%) for mated gender (blue) and different gender (orange) of the attacker to target victims. (a) Face++. (b) VGG-Face.

6.3.4 Analysis of Generalization Ability.

To compromise a face verification system in a real-world scenario, an attacker cannot acquire a large scale of verification feedback from the target system in advance. Accordingly, in this work, Azure is used as an unknown face verification system for evaluating the generalization ability of the facial artifacts, which are constructed based on the feedback from Face++ and VGG-Face, and the results are listed in Table 10. It can be found that the proposed attack can achieve a mean IAPMR = 44.89% for Azure, and it almost equals the result of the mean IAPMR = 43.42% for Face++. It demonstrates that the facial artifacts constructed by the feedback from Face++ and VGG-Face generalize to an unknown system. By comparing the results of different target victims, it can be found that the attack performance on Azure is also influenced by the gender.
Table 10.
AttacksNoseUpperMidfaceCentralMean
All victims37.1060.2244.6237.6344.89
Male47.6267.6265.7140.0055.24
Female23.4650.6217.2834.5731.48
Table 10. IAPMR (%) of an Unknown System Azure

6.3.5 Performance Analysis of Sequentially Compound Attack.

Due to strong intra-class variation tolerance in real face verification applications, a user is allowed to try multiple times during a verification. Thus, experiments are accordingly made by compound attacks combining multiple types of facial artifacts. To be more specific, for each target victim, a compound attack is launched in the order of nose-, central-face-, upper-face-, and midface-based artifacts. Each of them is used only once. In that case, if either of the artifacts in a compound attack matches with a target reference, the target victim can be impersonated by a compound attack. With compound attacks, IAPMR = 81.18% can be obtained on Azure, and the average number of attacks is 2.33 attempts. Therefore, compared with a single type of facial artifact, impostor attack performance can be further improved by compound attacks. Furthermore, it also indicates that the proposed nose-, central-face-, upper-face-, and midface-based artifacts are complementary to each other.

6.3.6 Evaluations with Real-world System.

To analyze the performance of the proposed attack on a real-world system, experiments are made to a real-world face PAD system BioID,5 and the results are listed in Table 11. For each target victim, an attacker is assumed to have three attack trials. It can be found that the real-world face PAD system BioID can be bypassed by the proposed nose-, upper-face-, and central-face-based artifacts with a result of APCER \(\gt\) 93%. However, around half of midface-based artifacts are correctly prevented by the system. The main reason is that the paste points are fixed at nose and cheeks when a midface-based artifact is pasted onto an attacker’s face, which leads to planar effects. Although such effects are reduced through warping a midface-based artifact, its structure is still different from that of a bona fide presentation to a certain extent.
Table 11.
AttacksNoseUpperMidfaceCentralMean
APCER (%)93.55100.0048.3996.7784.68
#Average trials1.141.151.931.401.41
Table 11. Results of Real-world System BioID
In addition to the results from BioID, experiments are also conducted on another real-world face verification system, Neurotechnology VeriLook 11.2 Demo,6 where a threshold of FAR = 0.1% is used. In the experiments, the attacker tries to impersonate four male target victims by using nose-based facial artifacts. For each target victim, the attacker is allowed to try three times. Experiments are conducted in an indoor environment, and the capture device is a web camera with a resolution of 640 \(\times\) 480 pixels. The results show that one-fourth of the target victims’ identities can be successfully impersonated.

6.3.7 Performance Analysis of Different Thresholds.

According to the International Standard ISO/IEC 30107-3 Biometric presentation attack detection [18], PAD performance is suggested to be reported as BPCER at a fixed APCER (e.g., BPCER at APCER = 5% or BPCER at APCER = 10%). Since the collected database only contains the data of attack presentations and does not contain the data of bona fide presentations, the proposed attack is evaluated with the thresholds set at APCER = 5% and 10%, and the results are listed in Table 12. It shows that the proposed attack can achieve very good attack performance with the standard PAD evaluation metrics.
Table 12.
Thresholds atNoseUpperMidfaceCentralMean
APCER = 5%100.00100.0096.77100.0099.19
APCER = 10%100.00100.00100.00100.00100.00
Table 12. APCER (%) from Guided Scale Texture-based Method [35] with Different Thresholds

6.3.8 Performance Analysis of Different Illumination Conditions.

To analyze the performance of the proposed attack to different illumination conditions, experiments are conducted with three different illumination conditions, which are normal, strong, and weak illumination. The experiments of normal illumination are carried out at an indoor environment but with outdoor daytime lighting. The experiments of strong illumination are carried out at the same indoor environment as that of normal illumination, but a lighted lamp is used in the environment as an additional lighting source. The experiments of weak illumination are carried out in another relatively dark indoor environment. In the experiments, the attacker tries to impersonate four male target victims by using nose-based facial artifacts. For each target victim, the attacker is allowed to try three times. The capture device is the frontal camera of a Redmi Note 4X smartphone, and some example images of different illumination conditions are shown in Figure 10. The performance is evaluated by Face++ and ResNet-50-128D [5], and the results are listed in Table 13, where a threshold of FAR = 0.1% is used for ResNet-50-128D, and it is tuned on the LFW database [16]. For Face++, the same IAPMR results are obtained with three different illumination conditions, while for ResNet-50-128D, the IAPMR of strong illumination is lower than those of the other two illumination conditions. The possible reason is that Face++ is more robust to illumination changes than that of ResNet-50-128D.
Table 13.
Illumination ConditionsNormalStrongWeak
Face++1/41/41/4
ResNet-50-128D3/42/43/4
Table 13. IAPMR Results of Different Illumination Conditions
Fig. 10.
Fig. 10. Example images of different illumination conditions. (a) Normal illumination. (b) Strong illumination. (c) Weak illumination.

6.3.9 Performance Comparison with Print, Replay, and Mask Attacks.

For face verification, the proposed attack is compared with zero-effort impostor attacks (the attacker’s own face is presented) and custom silicone face mask attacks [39]. Since the proposed attack is constructed based on the feedback from Face++ and VGG-Face, impostor attack performance is only compared with the unknown Azure system for fair comparison, and the results are listed in Table 14. It can be found that all components-based facial artifacts in the proposed attack can achieve higher IAPMR results than those of zero-effort attacks and custom silicone face mask attacks [39]. It demonstrates that facial-components-based artifacts can obtain the best impostor attack performance. Moreover, the distribution of the face verification scores is shown in Figure 11. It indicates that the Azure system is vulnerable to the proposed attack.
Table 14.
AttacksZero Effort3D Mask [39]NoseUpperMidfaceCentralCompound
IAPMR (%)0.0020.8337.1060.2244.6237.6381.18
Table 14. Impostor Attack Performance Comparison on Azure System
Fig. 11.
Fig. 11. The distribution of face verification scores from the Azure system (3D mask—custom silicone mask [39]).
For face PAD, the proposed attack is compared with 2D PAs (Replay-Attack [8], MSU MFSD [49]), custom silicone face mask PAs [39], and the holistic face-based A4 paper PAs. Among them, the Replay-Attack database contains mobile PAs, high-resolution screen PAs, and A4 paper print attacks. MSU MFSD contains iPad replay PAs, iPhone replay PAs, and A3 paper print attacks. As a matter of fact, multiple PAD methods may be installed into a high-security-level system in real-world scenarios, so the minimum APCER of all face PAD methods is also reported here for evaluating the attack performance under this situation. The results are listed in Table 15. It can be found that all components-based facial artifacts in the proposed PA can achieve higher minimum APCER results than those of print, replay, and custom silicone mask PAs. Furthermore, in most cases, an individual face PAD method is also more vulnerable to the proposed attack. As only partial facial components and areas are used for creating artifacts in this article, a face area presented to a system still retains most of the properties of a bona fide presentation (e.g., physiological signals, 3D structure, skin color, human skin appearances, rigid and non-rigid motions). Meanwhile, an attacker pasted with a components-based facial artifact also can interact with a system in real time, such as eye blinking, mouth opening, and head movements. Moreover, the face image of an attack presentation captured by a system is mostly a single captured one. Although the minimum APCER of the custom silicone mask PAs is almost the same as that of the proposed upper-face-based PAs, it costs nearly USD 4,000 to create a single mask, which is much more expensive than that of the proposed PA. From the above analysis, compared with the existing PAs, the proposed facial-components-based PA poses a more serious threat to both face verification and PAD technologies, and it also can be easily launched at low cost.
Table 15.
Presentation AttacksTexture \(^{\tiny {\href {1}{1}}}\) Quality \(^{\tiny {\href {1}{1}}}\) Structure \(^{\tiny {\href {1}{1}}}\) Color \(^{\tiny {\href {1}{1}}}\) Motion \(^{\tiny {\href {1}{1}}}\) BioIDMin \(^{\tiny {\href {2}{2}}}\)
High resolution [8]7.2577.5055.500.0098.75-0.00
Mobile [8]0.7556.7531.500.0095.75-0.00
A4 print [8]36.5081.0064.500.50100.00-0.50
iPad replay [49]4.298.5747.140.0055.71-0.00
iPhone replay [49]4.2911.4347.144.2920.00-4.29
A3 print [49]61.4341.4321.4378.5778.57-21.43
3D mask [39]27.1568.6875.8758.00--27.15
Whole face A4 print-----0.00-
Nose (proposed)99.4684.9593.5575.2764.5293.5564.52
Central face (proposed)82.2682.2692.4841.4068.8296.7741.40
Midface (proposed)77.4295.7093.0154.8474.7348.3954.84
Upper face (proposed)100.0093.0199.4627.9688.71100.0027.96
Table 15. APCER (%) of Different Face Presentation Attacks (PAs)
\(^{1}\) Texture: guided scale texture [35], quality: image quality assessment [13], structure: diffusion speed model [20], color: color texture analysis [3], motion: volume local binary count patterns [56].
\(^{2}\) Min means the minimum APCER of five presentation attack detection (PAD) methods.

6.3.10 Performance Comparison with Adversarial Examples-based Attacks.

To demonstrate the effectiveness of the proposed attack, experiments are conducted to compare the performance of the proposed PA with that of adversarial examples-based PAs [41, 42]. To fairly compare their performance, the PAIs of all methods are created by using VGG-Face, and they are evaluated by Face++, ResNet-50-128D, and VeriLook (the threshold FAR = 0.1% is used). All PAIs are printed with a Canon G2800 printer. In the experiments, the attacker tries to impersonate four male target victims. For each target victim, the attacker is allowed to try three times. All the experiments are conducted in an indoor environment with normal illumination, and the results are listed in Table 16. It can be found that the proposed nose-based PA achieves higher IAPMR and mean verification scores than those of the existing adversarial examples-based PAs [41, 42]. It demonstrates that the proposed attack has better generalization ability compared with the existing adversarial examples-based PAs [41, 42]. Since the focus of this article is PAs for unattended face verification systems, discussions of inconspicuousness are out of scope in this article. According to the original results in [41], its success rate of impersonation attacks on Face++ is 100%, which is different from the reproduced results in this article. The possible reason is that the results of [41] on Face++ are obtained in the digital domain, and its experiments are conducted with face image classification models. The results of this subsection are obtained in the physical domain, and the experiments are conducted with a verification scenario. Another possible reason is that the data subjects used in this subsection are different from those used in [41].
Table 16.
Presentation Attacks (PAs)Face++ResNet-50-128DVeriLook
IAPMRScore \(^{\tiny {\href {1}{1}}}\) IAPMRScore \(^{\tiny {\href {1}{1}}}\) IAPMRScore \(^{\tiny {\href {1}{1}}}\)
Zero effort (attacker’s own face)0/438.6980/40.8960/4-
Accessorize to a crime [41]0/442.5981/40.9020/4-
Adversarial generative nets [42]0/440.4950/40.9010/4-
Nose-based PA (proposed)1/453.4903/40.9251/444
Table 16. Comparison between the Adversarial Examples-based Presentation Attacks (PAs) and the Proposed PA
\(^{1}\) Mean verification scores of all the attack presentations. A higher score represents better impostor attack performance. The scores of ResNet-50-128D are the cosine similarity between two facial feature vectors.

6.3.11 Performance Analysis of Different Ethnicity.

To analyze the performance of the proposed attack when attackers and target victims are from different ethnic groups, experiments are performed on Face++ in the digital domain with the ND-IIITD retouched faces database [2], where 236 Caucasians, 62 Asians, and 3 Africans from this database are used. In the experiments, one of the Caucasians acts as an attacker in turn, and two Asians and two Africans are randomly chosen as target victims. To randomly choose the target victims, each data subject is assigned a unique number. When the attackers are Caucasians and the target victims are Asians, the proposed nose-based, central-face-based, and upper-face-based attacks achieve IAPMRs of 5.72%, 55.72%, and 55.08%, respectively. When the attackers are Caucasians and the target victims are Africans, the proposed nose-based, central-face-based, and upper-face-based attacks achieve IAPMRs of 12.08%, 67.80%, and 63.56%, respectively. Since facial appearances and biometric characteristics between different ethnic groups are quite different, the IAPMR results of different ethnicities are lower than those (listed in Table 5) of mated ethnicity.

6.3.12 Performance Analysis of Different Attackers.

To analyze the performance of the proposed attack launched by different attackers, experiments are conducted with two other male participants, and both of them are university students. The first participant knows some basics about face verification, the second participant is not in the domain of face verification, and the facial similarity between them cannot directly match with each other on Face++. In the experiments, the participants try to impersonate each other’s identity by using the proposed facial artifacts (nose, upper face, midface, central face based). For each type of facial artifact, the participants are allowed to try three times. When the attacker is the second participant, the results from Face++ show that nose-based and midface-based artifacts achieve successful impostor attacks. When the attacker is the first participant, the proposed facial artifacts are unsuccessful on impostor attacks. The results demonstrate that an inexperienced attacker can successfully launch the proposed attack. Apart from the experiments on Face++, the participants also try to attack a face unlocking system deployed in a Huawei smartphone, but the proposed facial artifacts do not succeed.

6.3.13 Performance Analysis with Deep Learning Presentation Attack Detection Methods.

To investigate the detection accuracy of deep learning PAD methods to the proposed attack, experiments are conducted with a state-of-the-art PAD method spatial aggregation of pixel-level local classifiers (SAPLC) [46] and a classical convolutional neural network ResNet-50 [15], respectively. SAPLC is implemented by fully convolutional networks that are supervised by the local ternary label. In SAPLC, deep fully convolutional networks are trained to predict pixel-level scores of local image patches, and then a shallow neural network is trained to aggregate the pixel-level scores for classification. The architecture of ResNet-50 is constructed by shortcut connections, and it is trained with a residual learning framework. To apply the pretrained ResNet-50 network to face PAD, this article uses transfer learning to fine-tune it. During fine-tuning, the initial classification layer of ResNet-50 is replaced by a binary classification layer, while the architecture and the parameters of other layers are frozen. The pretrained ResNet-50 network is fine-tuned by the use of binary cross-entropy loss, where the batch size is 128 and the number of epochs is 100. The learning rate starts at 0.01, and it decays to 0.001 and 0.0001 after 40 epochs and 70 epochs, respectively. The dataset used for training detection models and tuning thresholds is the same as that described in Section 6.1. The detection results of the collected images captured by iPhone X are listed in Table 17. By comparing the results between Table 8 and Table 17, it is observed that the APCERs of the deep learning PAD methods are lower than those of the traditional PAD methods due to the powerful learning capability of the deep learning networks. Nevertheless, as the proposed attack still achieves the result of a mean APCER \(\gt\) 45%, it demonstrates that the proposed attack can evade the detection of the deep learning PAD methods to a certain extent.
Table 17.
AttacksNoseCentralMidfaceUpperMean
SAPLC [46]36.0871.3072.4562.0660.47
ResNet-50 [15]53.8441.0632.4055.4245.68
Table 17. APCER (%) of Deep Learning Presentation Attack Detection Methods to the Collected Images Captured by iPhone X

6.3.14 Analysis of Limitations.

During the evaluation of the proposed attack, some components-based facial artifacts are still correctly rejected by Face++, VGG-Face, and Azure. This is because there are large discrepancies between an attacker and a target victim in terms of facial appearances, which are caused by different genders or a large age gap (8 years). Some examples are shown in Figure 12(a) and 12(b), respectively. For a target victim with long hair, most parts of the facial biometric characteristics of the upper-face-based artifact are occluded, which is shown in Figure 12(c). Meanwhile, the collected database only contains Chinese, and the size of the database is also limited. Moreover, the region-based PAD methods [4, 11] may be able to detect the proposed attack.
Fig. 12.
Fig. 12. Examples of the proposed artifacts failed in impostor attacks. (a) Different gender. (b) Different age. (c) Occlusion.

7 Conclusions

In this article, a face PA using facial components is proposed to compromise an unattended face verification system. The vulnerability of different facial components to impostor PAs is investigated, and it indicates that a target victim impersonation can be carried out by using partial face. Experimental results and analyses show that it outperforms the existing print, replay, mask, and adversarial example-based PAs from the aspects of attacking unknown face verification and PAD systems. Furthermore, the proposed PA is simple and is at low cost, so it can be easily implemented by an attacker and imposes a real threat to the existing face verification systems. Moreover, the results also indicate that the nose is the most vulnerable facial component to impostor PAs. However, the performance of the proposed PA is still affected by the factors such as gender and large age gap. Our future work will focus on creating components-based artifacts with custom silicone mask and PAD for the proposed attack.

Footnotes

References

[1]
Akshay Agarwal, Akarsha Sehwag, Richa Singh, and Mayank Vatsa. 2019. Deceiving face presentation attack detection via image transforms. In 5th IEEE International Conference on Multimedia Big Data (BigMM’19). IEEE, 373–382. https://doi.org/10.1109/BigMM.2019.00018
[2]
Aparna Bharati, Richa Singh, Mayank Vatsa, and Kevin W. Bowyer. 2016. Detecting facial retouching using supervised deep learning. IEEE Trans. Inf. Forensics Secur. 11, 9 (2016), 1903–1913. https://doi.org/10.1109/TIFS.2016.2561898
[3]
Zinelabidine Boulkenafet, Jukka Komulainen, and Abdenour Hadid. 2016. Face spoofing detection using colour texture analysis. IEEE Trans. Inf. Forensics Secur. 11, 8 (2016), 1818–1830. https://doi.org/10.1109/TIFS.2016.2555286
[4]
R. Cai, H. Li, S. Wang, C. Chen, and A. C. Kot. 2021. DRL-FAS: A novel framework based on deep reinforcement learning for face anti-spoofing. IEEE Trans. Inf. Forensics Secur. 16 (2021), 937–951. https://doi.org/10.1109/TIFS.2020.3026553
[5]
Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, and Andrew Zisserman. 2018. VGGFace2: A dataset for recognising faces across pose and age. In 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG’18). IEEE Computer Society, 67–74. https://doi.org/10.1109/FG.2018.00020
[6]
Cunjian Chen, Antitza Dantcheva, Thomas Swearingen, and Arun Ross. 2017. Spoofing faces using makeup: An investigative study. In IEEE International Conference on Identity, Security and Behavior Analysis (ISBA’17). IEEE, 1–8. https://doi.org/10.1109/ISBA.2017.7947686
[7]
Jianbo Chen, Michael I. Jordan, and Martin J. Wainwright. 2020. Hopskipjumpattack: A query-efficient decision-based attack. In 2020 IEEE Symposium on Security and Privacy. IEEE, 1277–1294.
[8]
Ivana Chingovska, André Anjos, and Sébastien Marcel. 2012. On the effectiveness of local binary patterns in face anti-spoofing. In 2012 Proceedings of the International Conference of Biometrics Special Interest Group (BIOSIG’12)(LNI, Vol. P-196), Arslan Brömme and Christoph Busch (Eds.). GI, 1–7.
[9]
Allan da Silva Pinto, Siome Goldenstein, Alexandre Ferreira, Tiago Carvalho, Hélio Pedrini, and Anderson Rocha. 2020. Leveraging shape, reflectance and albedo from shading for face presentation attack detection. IEEE Trans. Inf. Forensics Secur. 15 (2020), 3347–3358. https://doi.org/10.1109/TIFS.2020.2988168
[10]
Tiago de Freitas Pereira, André Anjos, José Mario De Martino, and Sébastien Marcel. 2013. Can face anti-spoofing countermeasures work in a real world scenario? In International Conference on Biometrics (ICB’13), Julian Fiérrez, Ajay Kumar, Mayank Vatsa, Raymond N. J. Veldhuis, and Javier Ortega-Garcia (Eds.). IEEE, 1–8. https://doi.org/10.1109/ICB.2013.6612981
[11]
D. Deb and A. K. Jain. 2020. Look locally infer globally: A generalizable face anti-spoofing approach. IEEE Trans. Inf Forensics Secur. 16 (2020), 1143–1157. https://doi.org/10.1109/TIFS.2020.3029879
[12]
Matteo Ferrara, Annalisa Franco, and Davide Maltoni. 2014. The magic passport. In IEEE International Joint Conference on Biometrics (IJCB’14). IEEE, 1–7. https://doi.org/10.1109/BTAS.2014.6996240
[13]
Javier Galbally and Sébastien Marcel. 2014. Face anti-spoofing based on general image quality assessment. In 22nd International Conference on Pattern Recognition (ICPR’14). IEEE Computer Society, 1173–1178. https://doi.org/10.1109/ICPR.2014.211
[14]
A. George and S. Marcel. 2021. Learning one class representations for face presentation attack detection using multi-channel convolutional neural networks. IEEE Trans. Inf. Forensics Secur. 16 (2021), 361–375.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[16]
Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07-49. University of Massachusetts, Amherst.
[17]
ISO/IEC JTC1 SC37 Biometrics. 2016. ISO/IEC 30107-1 Information Technology - Biometric Presentation Attack Detection - Part 1: Framework. International Organization for Standardization.
[18]
ISO/IEC JTC1 SC37 Biometrics. 2017. ISO/IEC 30107-3 Information Technology - Biometric Presentation Attack Detection - Part 3: Testing and Reporting. International Organization for Standardization.
[19]
Shan Jia, Guodong Guo, and Zhengquan Xu. 2020. A survey on 3D mask presentation attack detection and countermeasures. Pattern Recognit. 98, Article 107032 (2020), 1–13. https://doi.org/10.1016/j.patcog.2019.107032
[20]
Wonjun Kim, Sungjoo Suh, and Jae-Joon Han. 2015. Face liveness detection from a single image via diffusion speed model. IEEE Trans. Image Process. 24, 8 (2015), 2456–2465. https://doi.org/10.1109/TIP.2015.2422574
[21]
Davis E. King. 2009. Dlib-ml: A machine learning toolkit. J. Mach. Learn. Res. 10 (2009), 1755–1758.
[22]
Ketan Kotwal, Zohreh Mostaani, and Sébastien Marcel. 2020. Detection of age-induced makeup attacks on face recognition systems using multi-layer deep features. IEEE Trans. Biom. Behav. Identity Sci. 2, 1 (2020), 15–25. https://doi.org/10.1109/TBIOM.2019.2946175
[23]
Haoliang Li, Peisong He, Shiqi Wang, Anderson Rocha, Xinghao Jiang, and Alex C. Kot. 2018. Learning generalized deep feature representation for face anti-spoofing. IEEE Trans. Inf. Forensics Secur. 13, 10 (2018), 2639–2652. https://doi.org/10.1109/TIFS.2018.2825949
[24]
H. Li, W. Li, H. Cao, S. Wang, F. Huang, and A. C. Kot. 2018. Unsupervised domain adaptation for face anti-spoofing. IEEE Trans. Inf. Forensics Secur. 13, 7 (2018), 1794–1809.
[25]
Xiao-Xin Li, Dao-Qing Dai, Xiao-Fei Zhang, and Chuan-Xian Ren. 2013. Structured sparse error coding for face recognition with occlusion. IEEE Trans. Image Process. 22, 5 (2013), 1889–1900. https://doi.org/10.1109/TIP.2013.2237920
[26]
Yan Li, Yingjiu Li, Ke Xu, Qiang Yan, and Robert H. Deng. 2018. Empirical study of face authentication systems under OSNFD attacks. IEEE Trans. Dependable Secur. Comput. 15, 2 (2018), 231–245. https://doi.org/10.1109/TDSC.2016.2550459
[27]
Yan Li, Yingjiu Li, Qiang Yan, Hancong Kong, and Robert H. Deng. 2015. Seeing your face is not enough: An inertial sensor-based liveness detection for face authentication. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Indrajit Ray, Ninghui Li, and Christopher Kruegel (Eds.). ACM, 1558–1569. https://doi.org/10.1145/2810103.2813612
[28]
Ajian Liu, Jun Wan, Sergio Escalera, Hugo Jair Escalante, Zichang Tan, Qi Yuan, Kai Wang, Chi Lin, Guodong Guo, Isabelle Guyon, and Stan Z. Li. 2019. Multi-modal face anti-spoofing attack detection challenge at CVPR2019. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops’19). Computer Vision Foundation/IEEE, 1601–1610. https://doi.org/10.1109/CVPRW.2019.00202
[29]
Siqi Liu, Xiangyuan Lan, and Pong C. Yuen. 2018. Remote photoplethysmography correspondence feature for 3D mask face presentation attack detection. In Proceedings of the 15th European Conference on Computer Vision (ECCV’18), Part XVI(Lecture Notes in Computer Science, Vol. 11220), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer, 577–594. https://doi.org/10.1007/978-3-030-01270-0_34
[30]
Yaojie Liu, Joel Stehouwer, Amin Jourabloo, and Xiaoming Liu. 2019. Deep tree learning for zero-shot face anti-spoofing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). Computer Vision Foundation/IEEE, 4680–4689. https://doi.org/10.1109/CVPR.2019.00481
[31]
Puspita Majumdar, Akshay Agarwal, Richa Singh, and Mayank Vatsa. 2019. Evading face recognition via partial tampering of faces. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops’19). Computer Vision Foundation/IEEE, 11–20. https://doi.org/10.1109/CVPRW.2019.00008
[32]
Dinh-Luan Nguyen, Sunpreet S. Arora, Yuhang Wu, and Hao Yang. 2020. Adversarial light projection attacks on face recognition systems: A feasibility study. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR Workshops’20). IEEE, 3548–3556. https://doi.org/10.1109/CVPRW50498.2020.00415
[33]
Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep face recognition. In Proceedings of the British Machine Vision Conference 2015 (BMVC’15), Xianghua Xie, Mark W. Jones, and Gary K. L. Tam (Eds.). BMVA Press, 41.1–41.12. https://doi.org/10.5244/C.29.41
[34]
Fei Peng, Le Qin, and Min Long. 2018. CCoLBP: Chromatic co-occurrence of local binary pattern for face presentation attack detection. In 27th International Conference on Computer Communication and Networks (ICCCN’18). IEEE, 1–9. https://doi.org/10.1109/ICCCN.2018.8487325
[35]
Fei Peng, Le Qin, and Min Long. 2018. Face presentation attack detection using guided scale texture. Multim. Tools Appl. 77, 7 (2018), 8883–8909. https://doi.org/10.1007/s11042-017-4780-0
[36]
Fei Peng, Le Qin, and Min Long. 2020. Face presentation attack detection based on chromatic co-occurrence of local binary pattern and ensemble learning. J. Vis. Commun. Image Represent. 66 (2020), 102746. https://doi.org/10.1016/j.jvcir.2019.102746
[37]
Yunxiao Qin, Chenxu Zhao, Xiangyu Zhu, Zezheng Wang, Zitong Yu, Tianyu Fu, Feng Zhou, Jingping Shi, and Zhen Lei. 2020. Learning meta model for zero- and few-shot face anti-spoofing. In The 34th AAAI Conference on Artificial Intelligence (AAAI’20), The 32nd Innovative Applications of Artificial Intelligence Conference (IAAI’20), The 10th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’20). AAAI Press, 11916–11923.
[38]
Ramachandra Raghavendra and Christoph Busch. 2017. Presentation attack detection methods for face recognition systems: A comprehensive survey. ACM Comput. Surv. 50, 1 (2017), 8:1–8:37. https://doi.org/10.1145/3038924
[39]
Raghavendra Ramachandra, Sushma Venkatesh, Kiran B. Raja, Sushil Bhattacharjee, Pankaj Wasnik, Sébastien Marcel, and Christoph Busch. 2019. Custom silicone face masks: Vulnerability of commercial face recognition systems & presentation attack detection. In 7th International Workshop on Biometrics and Forensics (IWBF’19). IEEE, 1–6. https://doi.org/10.1109/IWBF.2019.8739236
[40]
Research and Development Unit. 2015. Best Practice Technical Guidelines for Automated Border Control (ABC) Systems. Technical Report. FRONTEX.
[41]
Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K. Reiter. 2016. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Edgar R. Weippl, Stefan Katzenbeisser, Christopher Kruegel, Andrew C. Myers, and Shai Halevi (Eds.). ACM, 1528–1540. https://doi.org/10.1145/2976749.2978392
[42]
Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K. Reiter. 2019. A general framework for adversarial examples with objectives. ACM Trans. Priv. Secur. 22, 3 (2019), 16:1–16:30. https://doi.org/10.1145/3317611
[43]
Meng Shen, Zelin Liao, Liehuang Zhu, Ke Xu, and Xiaojiang Du. 2019. VLA: A practical visible light-based attack on face recognition systems in physical world. IMWUT 3, 3 (2019), 103:1–103:19. https://doi.org/10.1145/3351261
[44]
Maneet Singh, Richa Singh, Mayank Vatsa, Nalini K. Ratha, and Rama Chellappa. 2019. Recognizing disguised faces in the wild. IEEE Trans. Biom. Behav. Identity Sci. 1, 2 (2019), 97–108. https://doi.org/10.1109/TBIOM.2019.2903860
[45]
Holger Steiner, Andreas Kolb, and Norbert Jung. 2016. Reliable face anti-spoofing using multispectral SWIR imaging. In International Conference on Biometrics (ICB’16). IEEE, 1–8. https://doi.org/10.1109/ICB.2016.7550052
[46]
Wenyun Sun, Yu Song, Changsheng Chen, Jiwu Huang, and Alex C. Kot. 2020. Face spoofing detection based on local ternary label supervision in fully convolutional networks. IEEE Trans. Inf. Forensics Secur. 15 (2020), 3181–3196. https://doi.org/10.1109/TIFS.2020.2985530
[47]
Di Tang, Zhe Zhou, Yinqian Zhang, and Kehuan Zhang. 2018. Face flashing: A secure liveness detection protocol based on light reflections. In 25th Annual Network and Distributed System Security Symposium (NDSS’18). Internet Society.
[48]
Guoqing Wang, Hu Han, Shiguang Shan, and Xilin Chen. 2021. Unsupervised adversarial domain adaptation for cross-domain face presentation attack detection. IEEE Trans. Inf. Forensics Secur. 16 (2021), 56–69. https://doi.org/10.1109/TIFS.2020.3002390
[49]
Di Wen, Hu Han, and Anil K. Jain. 2015. Face spoof detection with image distortion analysis. IEEE Trans. Inf. Forensics Secur. 10, 4 (2015), 746–761. https://doi.org/10.1109/TIFS.2015.2400395
[50]
Yi Xu, True Price, Jan-Michael Frahm, and Fabian Monrose. 2016. Virtual U: Defeating face liveness detection by building virtual models from your public photos. In 25th USENIX Security Symposium (USENIX Security’16), Thorsten Holz and Stefan Savage (Eds.). USENIX Association, 497–512.
[51]
Zitong Yu, Xiaobai Li, Xuesong Niu, Jingang Shi, and Guoying Zhao. 2020. Face anti-spoofing with human material perception. CoRR abs/2007.02157 (2020). arxiv:2007.02157
[52]
Zitong Yu, Chenxu Zhao, Zezheng Wang, Yunxiao Qin, Zhuo Su, Xiaobai Li, Feng Zhou, and Guoying Zhao. 2020. Searching central difference convolutional networks for face anti-spoofing. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, 5294–5304. https://doi.org/10.1109/CVPR42600.2020.00534
[53]
Bowen Zhang, Benedetta Tondi, and Mauro Barni. 2020. Adversarial examples for replay attacks against CNN-based face recognition with anti-spoofing capability. Comput. Vis. Image Underst. 197–198 (2020), 102988. https://doi.org/10.1016/j.cviu.2020.102988
[54]
Shifeng Zhang, Xiaobo Wang, Ajian Liu, Chenxu Zhao, Jun Wan, Sergio Escalera, Hailin Shi, Zezheng Wang, and Stan Z. Li. 2019. A dataset and benchmark for large-scale multi-modal face anti-spoofing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). Computer Vision Foundation/IEEE, 919–928. https://doi.org/10.1109/CVPR.2019.00101
[55]
Zhiwei Zhang, Junjie Yan, Sifei Liu, Zhen Lei, Dong Yi, and Stan Z. Li. 2012. A face antispoofing database with diverse attacks. In 5th IAPR International Conference on Biometrics (ICB’12), Anil K. Jain, Arun Ross, Salil Prabhakar, and Jaihie Kim (Eds.). IEEE, 26–31. https://doi.org/10.1109/ICB.2012.6199754
[56]
Xiaochao Zhao, Yaping Lin, and Janne Heikkilä. 2018. Dynamic texture recognition using volume local binary count patterns with an application to 2D face spoofing detection. IEEE Trans. Multimedia 20, 3 (2018), 552–566. https://doi.org/10.1109/TMM.2017.2750415
[57]
Zhe Zhou, Di Tang, Xiaofeng Wang, Weili Han, Xiangyu Liu, and Kehuan Zhang. 2018. Invisible mask: Practical attacks on face recognition with infrared. CoRR abs/1803.04683 (2018). arxiv:1803.04683

Cited By

View all
  • (2024)Detection of Adversarial Facial Accessory Presentation Attacks Using Local Face DifferentialACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364383120:7(1-28)Online publication date: 27-Mar-2024
  • (2024)DEEPFAKER: A Unified Evaluation Platform for Facial Deepfake and Detection ModelsACM Transactions on Privacy and Security10.1145/363491427:1(1-34)Online publication date: 5-Feb-2024
  • (2024)DeepMark: A Scalable and Robust Framework for DeepFake Video DetectionACM Transactions on Privacy and Security10.1145/362997627:1(1-26)Online publication date: 5-Feb-2024
  • Show More Cited By

Index Terms

  1. Vulnerabilities of Unattended Face Verification Systems to Facial Components-based Presentation Attacks: An Empirical Study

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Privacy and Security
        ACM Transactions on Privacy and Security  Volume 25, Issue 1
        February 2022
        219 pages
        ISSN:2471-2566
        EISSN:2471-2574
        DOI:10.1145/3485162
        Issue’s Table of Contents

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 23 November 2021
        Accepted: 01 August 2021
        Revised: 01 May 2021
        Received: 01 November 2020
        Published in TOPS Volume 25, Issue 1

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Face verification
        2. presentation attack
        3. presentation attack detection
        4. facial components

        Qualifiers

        • Research-article
        • Refereed

        Funding Sources

        • National Natural Science Foundation of China
        • China Scholarship Council

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)1,019
        • Downloads (Last 6 weeks)65
        Reflects downloads up to 01 Sep 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Detection of Adversarial Facial Accessory Presentation Attacks Using Local Face DifferentialACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364383120:7(1-28)Online publication date: 27-Mar-2024
        • (2024)DEEPFAKER: A Unified Evaluation Platform for Facial Deepfake and Detection ModelsACM Transactions on Privacy and Security10.1145/363491427:1(1-34)Online publication date: 5-Feb-2024
        • (2024)DeepMark: A Scalable and Robust Framework for DeepFake Video DetectionACM Transactions on Privacy and Security10.1145/362997627:1(1-26)Online publication date: 5-Feb-2024
        • (2024)Social Engineering Shoulder Surfing Attacks (SSAs): A Literature Review. Lessons, Challenges, and Future DirectionsAdvanced Research in Technologies, Information, Innovation and Sustainability10.1007/978-3-031-48855-9_17(220-233)Online publication date: 3-Jan-2024
        • (2023)Spoofing Real-world Face Authentication Systems through Optical Synthesis2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179351(882-898)Online publication date: May-2023
        • (2023)Gesture-Related Two-Factor Authentication for Wearable Devices via PPG SensorsIEEE Sensors Journal10.1109/JSEN.2023.326944023:12(13114-13126)Online publication date: 15-Jun-2023
        • (2023)Adversarial Attacks and Defenses in Machine Learning-Empowered Communication Systems and Networks: A Contemporary SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2023.331949225:4(2245-2298)Online publication date: 26-Sep-2023

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Get Access

        Login options

        Full Access

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media