1 Introduction
After more than a decade of research [
24] and thousands of papers [
5], it is well-known that
Machine Learning (ML) methods are vulnerable to “adversarial attacks.”
1 Specifically, by introducing imperceptible perturbations (down to a single pixel or byte [
15,
88]) in the input data, it is possible to compromise the predictions made by an ML model. Such vulnerability, however, is more dangerous in settings that implicitly assume the presence of adversaries. A cat will not try to fool an ML model. An attacker, in contrast, will actively try to evade an ML detector—the focus of this article.
On the surface, the situation portrayed in research is vexing. The confirmed successes of ML [
52] are leading to large-scale deployment of ML in production settings (e.g., References [
34,
81,
90]). At the same time, however, dozens of papers showcase adversarial attacks that can crack “any” ML-based detector (e.g., References [
16,
61]). Although some papers propose countermeasures (e.g., Reference [
77]), they are quickly defeated (e.g., Reference [
31]), and typically decrease the baseline performance (e.g., References [
16,
35]). As a result, recent reports [
38,
57] focusing on the integration of ML
in practice reveal that: “I Never Thought About Securing My Machine Learning Systems” [
26]. This is not surprising: If ML can be so easily broken, then why invest resources in increasing its security through—unreliable—defenses?
Sovereign entities (e.g., References [
3,
4]) are endorsing the development of “trustworthy” ML systems; yet, any enhancement should be economically justified. No system is foolproof (ML-based or not [
29]), and guaranteeing protection against omnipotent attackers is an enticing but unattainable objective. In our case, a security system should increase the
cost incurred by an attacker to achieve their goal [
66]. Real attackers have a cost/benefit mindset [
99]: they may try to evade a detector, but only if doing so yields positive returns. In reality, worst-case scenarios are an exception—not the norm.
Our article is inspired by several recent works that pointed out some “inconsistencies” in the adversarial attacks carried out by prior studies. Pierazzi et al. [
78] observe that real attackers operate in the “problem-space,” i.e., the perturbations they can introduce are subject to physical constraints. If such constraints are not met, and hence the perturbation is introduced in the “feature-space” (e.g., References [
68]), then there is a risk of generating an adversarial example that is not physically realizable [
92]. Apruzzese et al. [
14], however, highlight that even “impossible” perturbations can be applied, but
only if the attacker has internal access to the data-processing pipeline of the target system. Nonetheless, Biggio and Roli suggest that ML security should focus on “anticipating the most likely threats” [
24]. Only
after proactively assessing the impact of such threats a suitable countermeasure can be developed—if required.
We aim to promote the development of secure ML systems. However, meeting Biggio and Roli’s recommendation presents two tough challenges for research papers. First, it is necessary to devise a realistic threat model, which portrays adversarial attacks that are not only physically realizable but also economically viable. Devising such a threat model, however, requires a detailed security analysis of the specific cyberthreat addressed by the detector—while factoring the resources that attackers are willing to invest. Second, it is necessary to evaluate the impact of the attack by crafting the corresponding perturbations. Doing so is difficult if the threat model assumes an attacker operating in the problem-space, because such perturbations must be applied on raw-data, i.e., before any preprocessing occurs—which is hard to find.
In this article, we tackle both of these challenges. In particular, we focus on ML-systems for
Phishing Website Detection (PWD). Countering phishing – still a major threat today [
8,
53]—is an endless struggle. Blocklists can be easily evaded [
91] and, to cope against adaptive attackers, some detectors are equipped with ML (e.g., Reference [
90]). Yet, as shown by Liang et al. [
61], even such ML-PWD can be “cracked” by oblivious attackers—if they invest enough effort to reverse engineer the entire ML-PWD. Indeed, we address ML-PWD because prior work (e.g., References [
23,
40,
59,
85]) assumed threat models that hardly resemble a real scenario. Phishing, by nature, is meant to be cheap [
54] and most attempts end up in failure [
71]. It is unlikely
2 that a phisher invests many resources
just to evade ML-PWD: even if a website is not detected, the user may be “hooked,” but is not “phished” yet. As a result, the state of the art on adversarial ML for PWD is immature—from a pragmatic perspective.
Contribution and Organization. Let us explain how we aim to spearhead the security enhancements to ML-PWD. We begin by introducing the fundamental concepts (PWD, ML, and adversarial ML) at the base of our article in Section
2, which also serves as a motivation. Then, we make the following five contributions.
–
We
formalize the evasion-space of adversarial attacks against ML-PWD (Section
3), rooted in exhaustive analyses of a generic ML-PWD. Such evasion-space explains “where” a perturbation can be introduced to fool an ML-PWD. Our formalization highlights that even adversarial samples created by direct feature manipulation can be realistic,
validating all the attacks performed by past work.
–
By using our formalization as a stepping stone, we propose a
realistic threat model for evasion attacks against ML-PWD (Section
4). Our threat model is grounded on detailed security considerations from the viewpoint of a typical phisher, who is confined in the “website-space.” Nevertheless, our model can be relaxed by assuming attackers with greater capabilities (which leads to higher cost).
–
We combine and practically demonstrate the two previous contributions. We perform an extensive, reproducible, and statistically validated
evaluation of adversarial attacks against state-of-the-art ML-PWD. By using diverse datasets, ML algorithms and features, we develop 18 ML-PWD (Section
5), each of which is assessed against 12 different evasion attacks built upon our threat model (Section
6).
–
By analyzing the results (Section
7) of our evaluation: (i) We
show the impact of attacks that are very likely to occur against both baseline and adversarially robust ML-PWD, and (ii) we are the first to
fairly compare the effectiveness of evasion attacks in the problem-space with those in the feature-space.
–
As an additional contribution of this journal article, we propose and empirically assess 6 new URL-related perturbations, as well as 37 new HTML-related perturbations that envision an attacker who can
operate in multiple spaces (Section
8).
Our results highlight that more realistic attacks are not as disruptive as claimed by past works (Section 9) but their low-cost makes them a threat that induces statistically significant degradation. Intriguingly, however, some “cheap” perturbations can lead to devastating impacts.
Finally, our evaluation serves as a “benchmark” for future studies: we provide the complete results in the Appendix, whereas the source-code and additional resources are publicly available at a dedicated website:
https://spacephish.github.io.
6 Evaluation: Attacks (Rationale and Implementation)
We now focus on our considered attacks. We begin by providing an extensive overview (Section
6.1), and then summarize the workflow for their empirical evaluation (Section
6.2). Finally, we describe their technical implementation (Section
6.3)
6.1 Considered Attacks
In our article, we consider a total of 12 evasion attacks, divided in four families. One of these families is an exact replica of our “standard” threat model. The remaining three families, however, are extensions of our threat model, which assume more “advanced” adversaries who have superior knowledge and/or capabilities.
Two of our families involve WsP (WA and \(\widehat{{\sf WA}}\)), but assume attackers with different knowledge; whereas the remaining two families involve either PsP or MsP (PA and MA). Each family has three variants depending on the features “targeted” by the attacker, i.e., either those related to the URL, the HTML, or a combination of both (u, r, or c). For WsP, the underlying “attacked” features are always the same for all variants, which are assumed to be known by the attacker: u is always the URL_length; for r is the HTML_objectRatio; and for c they are both of these. (Do note that our WsP will also affect features beyond the attacker’s knowledge.)
–
Cheap Website Attacks (
WA) perfectly align with our threat model (and resemble the use-cases in Section
4.5). The perturbations are created in the website-space (WsP), realizing either
\({{\sf WA}^{u}}\),
\({{\sf WA}^{r}}\), or
\({{\sf WA}^{c}}\). Specifically, for
r (and
c), we consider two semantically equivalent WsP: “add fake link” for
\(\delta\)Phish, and “link wrapping” for
Zenodo. Such WsP attempt to balance the object ratio: the former by adding (invisible) links to (fake) internal objects, whereas the latter by eluding the preprocessing mechanism—thereby having a link not being counted among the total links shown in a webpage.
–
Advanced Website Attacks (\(\widehat{{\sf WA}}\)), which envision a more knowledgeable attacker than WA. The attacker knows how the feature extractor within the ML-PWD operates (i.e., they know the specific thresholds used to compute some features). The attacker—who is still confined in the website-space—will hence craft more sophisticated WsP, because they know how to generate an adversarial sample that is more likely to influence the ML-PWD. Thus, the attacker will modify either the URL, the HTML, or both (i.e., \(\widehat{{\sf WA}^{u}}\), \(\widehat{{\sf WA}^{r}}\), \(\widehat{{\sf WA}^{c}}\)), but in more elaborate ways—e.g., by ensuring that the HTML_objectRatio exactly resembles the one of a “benign” sample; or by making an URL to be “long enough” to be considered short.
–
Preprocessing Attacks (PA), which are an extension of our threat model, and assume an even stronger attacker that is able to access the preprocessing stage of the ML-PWD, and hence introduce PsP. Such an attacker is capable of direct feature manipulation—subject to integrity checks (i.e., the result must reflect a “physically realizable” webpage). Since the attacker does not know anything about the actual \(\mathcal {M}\), the attacker must still guess their PsP. Such PsP will target features based on either u, r, c (i.e., \({{\sf PA}^{u}}\), \({{\sf PA}^{r}}\), \({{\sf PA}^{c}}\)) by accounting for inter-dependencies between other features.
–
ML-space attacks (MA), representing a worst-case scenario. The attacker can access the ML-space of the ML-PWD, and can hence freely manipulate the entire feature representation of their webpage through MsP. However, the attacker is still oblivious of \(\mathcal {M}\), and must hence still guess their WsP. Thus, the MsP applied by the attacker completely “flip” many features related to u, r, c (i.e., \({{\sf MA}^{u}}\), \({{\sf MA}^{r}}\), \({{\sf MA}^{c}}\)).
Motivation. We consider these 12 attacks for three reasons. First, to assess the effects of diverse
evasion attacks at increasing “cost.” For instance, the simplicity of
WA makes them the most likely to occur; whereas
MA can be disruptive, but are very expensive (from the attacker’s viewpoint). Second, to study the response of ML-PWD to WsP targeting the same features (
\({{\sf WA}^{r}}\)), but in different ways (one per dataset), leading to alterations of
different features beyond the attacker’s knowledge. Third, to highlight the
effects of potential “pitfalls” of related research. Indeed, we observe that all three remaining families (
\(\widehat{{\sf WA}}\),
PA,
MA) envision attackers with similar knowledge, which they use to target
similar features. Such peculiarity allows comparing attacks carried out in different “spaces.” A particular focus is on
PA, for which we apply PsP by
anticipating how a WsP can yield a physically realizable [
92] PsP. Put differently, our evaluation shows what happens if the perturbations are applied without taking into account
all preprocessing operations that transform a given
x into the
\(F_x\) analyzed by
\(\mathcal {M}\).
Effectiveness and Affordability. In terms of effectiveness, assuming the same targeted features,
WA \(\lt\) \(\widehat{{\sf WA}}\) \(\lt\) PA \(\ll\) MA (as confirmed by our results in Section
7.2). This is justified by the higher investment required by the attacker, who must either perform extensive intelligence gathering campaigns (to understand the exact feature extractor for
\(\widehat{{\sf WA}}\)) or gain write-access to the ML-PWD (for
PA and
MA). Let us provide a high-level summary of the requirements to implement all our attacks—all of which are
query-less and rely on
blind perturbations.
–
WA: they require as little as a dozen lines of elementary code, and a very rough understanding of how ML-PWD operate (which can be done, e.g., by reading research papers).
–
\(\widehat{{\sf WA}}\): they also require a few lines of code to implement. However, determining the exact thresholds requires a detailed intelligence gathering campaign (or many queries to reverse-engineer the ML-PWD, if it is client-side).
–
PA: they require a compromise of the ML-PWD. For example, introducing a special “backdoor” rule that “if a given URL is visited, then do not compute its length and return that the URL is
short.” Doing this is costly, but it is not unfeasible if the feature extractor is open-source (e.g., Reference [
22]).
–
MA: they also require a compromise of the ML-PWD. In this case, the “backdoor” is introduced after all features have been computed—and irrespective of their relationships. Hence. the cost is very high: the ML model is likely to be tailored for a specific environment, thereby increasing the difficulty of successfully introducing such backdoors in one of the deepest segments of the ML-PWD.
Hence, in terms of affordability: WA \(\gg\) \(\widehat{{\sf WA}}\) \(\gg\) PA \(\gt\) MA (i.e., the relationship is the reverse of the effectiveness). For this reason, in our evaluation, we will put a greater emphasis on WA, because “cheaper” attacks are more likely to occur in the wild: while WA can be associated with “horizontal phishing” (the majority), the others are tailored for “spear phishing” (the minority).
6.2 Evaluation Workflow
The procedure to assess the adversarial attacks involves three steps:
(1)
Isolate. Our threat model envisions evasion attacks that occur during inference, hence our adversarial samples are generated from those in
\(P_i\). Furthermore, we recall that the attacker expects the ML-PWD to be effective against “regular” malicious samples. To meet such condition, we isolate 100 samples from
\(P_i\) that are detected successfully by the best
16 ML-PWD (typically using
\(F^c\)) during one of our runs. Such samples are then used as a basis to craft the adversarial samples corresponding to each of the 12 considered types of evasion attacks.
(2)
Perturb. We apply the perturbations as follows. For WA and \(\widehat{{\sf WA}}\), we craft the corresponding WsP, apply them to each of the 100 samples from \(P_i\), and then preprocess such samples by using the feature extractor. For PA and MA, we first preprocess the 100 samples with the feature extractor, and then apply the corresponding PsP or MsP. Overall, these operations result in 1,200 adversarial samples (given by 12 attacks, each using 100 samples).
(3)
Evade. The 1,200 adversarial samples are sent to the 9 ML-PWD (for each dataset), and we measure the tpr again.
The expected result is that the tpr obtained on the adversarial samples (generated as a result of any of the 12 considered attacks) will be lower than the tpr on the original 100 phishing samples.
6.3 Attacks Implementation
Let us discuss how we implement our perturbations, and provide some insight as to which features are influenced as a result of our attacks. We recall that each attack family presents three variants, depending on which features the attacker is “consciously” trying to affect. Namely, u, \(r,\) and c, i.e., features involving the URL, the representation (HTML) or a combination thereof. All attacks are created by manipulating (phishing) samples taken from \(P_i\). In particular, during our first trial, we isolate 100 samples from \(P_i\) that are correctly detected by the best ML-PWD: such samples are then used as the basis for all their adversarial variants (to ensure consistency). We will denote any of such samples as p.
We start by describing MA, which are the easiest to implement. Then, we describe WA and \(\widehat{{\sf WA}^{}}\). Finally, we describe PA, which are the most complex to implement, because they must consider several implications (e.g., inter-feature dependencies). (Our repository includes the exact implementation of MA and PA, and also all the pre-processed variants of the samples generated via WA and \(\widehat{{\sf WA}^{}}\).)
6.3.1 ML-space attacks.
The attacks (i.e.,
MA) are the easiest to implement. Indeed, we simply follow the same procedure as done by most prior works (e.g., References [
33,
59]) that directly manipulate the feature representation
\(F_p\) of a sample
p right before it is analyzed by the ML-PWD. We do this without taking into account any inter-dependency between features and/or any physical property that the actual webpage must preserve: this is compliant with our assumption that the attacker has access to the ML-space. Specifically, for each
MA, we apply the following MsP:
–
\({{\sf MA}^{u}}\): The attacker targets URL-related features. Hence, we manipulate
\(F_p\) by setting features based on
\(F^u\) equal to
\(-1\), which denotes a value that is more likely associated with a benign sample. In particular, we set to
\(-1\) the features in Table
1 with the following numbers: (1–17, 19–21, 27, 30–35)
–
\({{\sf MA}^{r}}\): Same as above, but the targeted features are within
\(F^r\). Hence, we set to
\(-1\) the features in Table
1 with the following numbers: (36–40, 42–52, 54–57)
–
\({{\sf MA}^{c}}\): We set to \(-1\) all features involved in \({{\sf MA}^{u}}\) and \({{\sf MA}^{r}}\).
We remark that the attacker is not aware of the feature importance (because it would require knowledge of \(\mathcal {M}\)). Hence, although some manipulations will likely “move” \(F_p\) toward a benign webpage, it is not guaranteed that \(\mathcal {M}\) will actually classify such \(F_p\) as benign: if the manipulated features are not important, then even MsP may have no effect (and such phenomenon does happen in our evaluation, e.g., the ML-PWD using RF with \(F^c\) on Zenodo against \({{\sf MA}^{r}}\)).
Of course, we could set
all features to
\(-1\) (e.g., all
\(F^u\) and
\(F^r\)). Doing this, however, would obviously result in a perfect misclassification (and hence not interesting to show). Moreover, it would not be sensible
even for the attacker. Indeed,
MA assume no knowledge of
\(\mathcal {M}\) and of
\(\mathcal {D}\), meaning that an attacker may suspect the existence of a honeypot [
83]. For instance,
\(\mathcal {D}\) may contain some samples with all features set to
\(-1\) (i.e., benign) that are labelled as phishing—for the sole purpose of defeating similar attacks in the ML-space. Hence, it is realistic to assume that even an attacker capable of
MA would not exaggerate with their perturbations.
6.3.2 Website attacks.
We recall that we performed two families of attacks in the website-space: WA and \(\widehat{{\sf WA}}\). The peculiarity of both of these attacks (both relying on WsP) is that the attacker does not have access to the ML-PWD. Hence, they are not able to manipulate \(F_p\), and they are not even able to observe \(F_p\).
•
WA: These attacks resemble the pragmatic example (discussed in Section
4.5). Let us elaborate:
–
\({{\sf WA}^{u}}\): We set the URL to a random string starting with “
www.bit.ly/,” followed by seven randomly chosen characters (which what this popular URL shortener does).
–
\({{\sf WA}^{r}}\): For
\(\delta\)Phish, we change the HTML by adding 50 invisible internal links (i.e., having the same root domain of the website)
17; for
Zenodo, we wrap all links within an “onclick,” i.e., we change:
<a href=‘link’> into
<a onclick=“this.href=‘link’\(\,\hbox{''}\)>.
18 –
\({{\sf WA}^{c}}\): We do both of the above for each dataset.
•
\(\widehat{{\sf WA}}\): These attacks envision an attacker that knows how the feature extractor within the ML-PWD operates (see Section
5.1.3). Such knowledge can be acquired, e.g., if the attacker has (or is) an insider that provided them with such intelligence. However, the attacker is still confined in the website-space, and hence can only apply WsP (to generate
\(\hspace{0.83328pt}\overline{\hspace{-0.83328pt}p\hspace{-0.83328pt}}\hspace{0.83328pt}\)). For a meaningful comparison, we assume an attacker who is aware of how the features targeted in
WA are “extracted” within the ML-PWD. Hence, we craft each
\(\widehat{{\sf WA}}\) as follows:
–
\(\widehat{{\sf WA}^{u}}\): The attacker, having knowledge of the extractor, knows that by using an URL shortener they will affect all features related to the URL (i.e., \(F^u\)); furthermore, they know the threshold (53) that makes an URL to be considered as “benign.” Such length is well above that of an URL generated via any shortening service. As such, these attacks are an exact replica as \(\widehat{{\sf WA}^{u}}\) (the only difference is that the attacker of \(\widehat{{\sf WA}^{u}}\) is more confident than the one in \({{\sf WA}^{u}}\)).
–
\(\widehat{{\sf WA}^{r}}\): The attacker manipulates the HTML in the same was as in \({{\sf WA}^{c}}\). However, the attacker also knows the threshold (0.15) of internal-to-external links that yields a benign value of the HTML_objectRatio feature. Hence, the WsP manipulate the HTML of each p by introducing as many links (or wrappings) as necessary to meet such threshold.
–
\(\widehat{{\sf WA}^{c}}\): The attacker does both of the above.
We stress that the attacker cannot observe \(F_{\hspace{0.83328pt}\overline{\hspace{-0.83328pt}p\hspace{-0.83328pt}}\hspace{0.83328pt}}\). Indeed, doing this would require the attacker to completely replicate the feature extractor, which is costly, and may not even be possible (some third-party services may require subscriptions to be used). As such, the attacker is aware of how to craft WsP that are more likely noticed by the ML-PWD, but evasion is not guaranteed.
6.3.3 Preprocessing attacks.
These attacks are the hardest to realize from a research perspective and in a fair way.
Challenges. The underlying principle of PsP (the backbone of PA) is affecting the preprocessing space of the ML-PWD. Technically, since we are the developers of our own feature-extractor (i.e., the component of the ML-PWD devoted to data preprocessing), we could simply directly manipulate our own extractor, i.e., by introducing a “backdoor.” However, doing this would prevent a fair generalization of our results: for instance, it is possible to develop another feature extractor, having the same functionality but whose operations are executed in a different order. Hence, to ensure a more fair evaluation, we apply the perturbations at the end of the preprocessing phase, but we do so by anticipating how a perturbation in the website-space (a WsP) could affect the preprocessing-space, thereby turning a WsP into a “physically realizable” PsP. To this purpose, we assume the viewpoint of an attacker. For instance, we ask ourselves: “If an attacker wants to affect URL features by using an URL shortener, how would the feature extractor react?”
Scenario. In PA the attacker knows and can interfere (through PsP) with the feature extraction process of the targeted ML-PWD. However, the attacker is not aware of what happens next: the ML-space and the output-space are both inaccessible by the attacker (from both a read and write perspective). Hence, once the PsP has been applied and \(\hspace{0.83328pt}\overline{\hspace{-0.83328pt}F_p\hspace{-0.83328pt}}\hspace{0.83328pt}\) is generated, the attacker cannot influence \(\hspace{0.83328pt}\overline{\hspace{-0.83328pt}F_p\hspace{-0.83328pt}}\hspace{0.83328pt}\) any longer. For each PA, we do the following:
–
\({{\sf PA}^{u}}\): we anticipate an attack that targets URL features, and specifically
URL_length, by using an URL shortener. Hence, we can foresee that operations (in the website-space) can lead to alterations of
all the features involved with the URL (i.e.,
\(F^u\)). For instance, doing this would make weird characters (if present) disappear from the URL. However, doing this would induce alterations also to
\(F^r\). For instance, some objects originally considered to be “internal” would become “external.” Hence, we implement
\({{\sf PA}^{u}}\) by setting the following features (from Table
1) to
\(-1\): (1–3, 5, 6, 8, 10–16, 22, 23, 25, 26, 28–30), whereas the following features are set to
\(+1\): (4, 27, 36–38, 41, 44, 48, 52, 54, 56).
–
\({{\sf PA}^{r}}\): we anticipate an attack that targets features related to the representation of a website—in our case the HTML, and specifically the
HTML_objectRatio feature. We foresee that an attacker can interfere with such a feature in many ways, for instance by removing links, adding new ones, or changing those already contained in the webpage. All such changes will affect many features, such as the
HTML_freqDom: because populating the HTML with (fake) internal links would change the “frequent domains” included in the HTML. Such changes can also affect the links in the footer of the webpage (
HTML_nullLnkFooter), or the anchors (
HTML_anchors), but also others. We implement
\({{\sf PA}^{r}}\) by setting the following features (from Table
1) to
\(-1\): (36–38, 41, 51, 54, 56, 57), whereas we set (39, 40) to 1 and 46 to 0.
–
\({{\sf PA}^{c}}\): they are a combination of the two above. We expect the attacker to use a URL shortener, and also interfere with the
HTML_objectRatio. However, we cannot simply set the features to the same values as
\({{\sf PA}^{r}}\) and
\({{\sf PA}^{u}}\), because one of the two will prevail. In our case, shortening the URL will be “stronger,” because the URL will change (to that of the URL shortener) and hence the internal objects will become “external.” Hence, we implement
\({{\sf PA}^{c}}\) by setting the following features (from Table
1) to
\(-1\): (1–3, 5, 6, 8, 10–16, 22, 23, 25, 26, 28–30), whereas the following features are set to
\(+1\): (4, 27, 36–38, 41, 44, 48, 52, 54, 56).
We remark that our PsP may not yield an \(\hspace{0.83328pt}\overline{\hspace{-0.83328pt}F_p\hspace{-0.83328pt}}\hspace{0.83328pt}\) that is a perfect match with a \(F_{\hspace{0.83328pt}\overline{\hspace{-0.83328pt}p\hspace{-0.83328pt}}\hspace{0.83328pt}}\) generated via WsP (i.e., those of \(\widehat{{\sf WA}^{}}\)). Indeed, some inconsistencies may be present—likely due to “inaccurate” anticipations from our (i.e., the attacker’s) side. Such inconsistencies are sensible. An attacker with access to the preprocessing-space can theoretically replicate the entire feature extractor, and use it to exactly pinpoint how to generate PsP that are an exact match with WsP (i.e., \(\hspace{0.83328pt}\overline{\hspace{-0.83328pt}F_p\hspace{-0.83328pt}}\hspace{0.83328pt}\)=\(F_{\hspace{0.83328pt}\overline{\hspace{-0.83328pt}p\hspace{-0.83328pt}}\hspace{0.83328pt}}\)). However, doing this would be very expensive. Furthermore, it would defeat the purpose of using PsP: the attacker does not want that \(\hspace{0.83328pt}\overline{\hspace{-0.83328pt}F_p\hspace{-0.83328pt}}\hspace{0.83328pt}\)=\(F_{\hspace{0.83328pt}\overline{\hspace{-0.83328pt}p\hspace{-0.83328pt}}\hspace{0.83328pt}}\), rather, they want a PsP that is “stronger”; otherwise, why use PsP in the first place?
9 Related Work
Countering phishing is a long-standing security problem, which can be considered as a subfield of cyberthreat detection—a research area that is being increasingly investigated also by adversarial ML literature [
16]. We focus on the detection of phishing
websites. Papers that consider phishing in social networks [
25], darkweb [
101], phone calls [
43], or emails [
37] are complementary to our work—although our findings can also apply to phishing email filters if they analyze the URLs included in the body text (e.g., Reference [
42]). Our focus is on attacks
against ML-PWD. For instance, Tian et al. [
91] evade PWD that use common blacklists, and their main proposal is to use ML as a detection engine to counter such “squatting” phishing websites. Hence, non-ML-PWD (e.g., Reference [
102]) are outside our scope.
Let us compare our article with existing works on evasion attacks against ML-PWD. We provide an overview in Table
6, highlighting the main differences of our article with the state of the art. Only half of related papers craft their attacks in the problem-space—which requires modifying the raw webpage. Unfortunately, most publicly available datasets do not allow similar procedures. A viable alternative is composing ad-hoc dataset through public feeds as done, e.g., by References [
40] and [
82] (the latter only for URL-based ML-PWD). All these papers, however, do not release the actual dataset, preventing reproducibility and hence introducing experimental bias. The authors of Reference [
87] share their dataset, but while the
malicious websites are provided with complete information (i.e., URL and HTML), the
benign websites are provided only with their URL—hence preventing complete reproducibility of attacks in the problem-space against ML-PWD inspecting the HTML. The latter is a well-known issue in related literature [
74], which does not affect our article, because our entire evaluation is reproducible. Notably, Aleroud et al. [
12] evaluate attacks both in the problem and feature-space, but on
different datasets, preventing a fair comparison. Indeed, they evade one ML-PWD trained on
PhishStorm (which only includes raw URLs) with attacks in the problem space; and another ML-PWD trained on
UCI (which is provided as pre-computed features) through feature space attacks. Hence, it is not possible to compare these two settings. A similar issue affects also Reference [
11], which considers four datasets, each having a different
F. Therefore, no prior work
compared the impact of attacks carried out in distinct evasion-spaces—to the best of our knowledge. Not many papers consider adversarially robust ML-PWD, and only half consider both SL and DL algorithms—which our evaluation shows to respond differently against adversarial examples (cf. Section
7.2). It is concerning that few papers overlook the importance of statistically significant comparisons. The most remarkable effort is Reference [
85], which only performs 10 trials (we do 50), which are not enough to compute precise statistical tests.
Most prior work assume stronger attackers than those envisioned in our threat model (cf. Section
4). Indeed, past threat models portray
black-box attackers who can freely inspect the output-space and query the ML-PWD (e.g., References [
11,
61,
82]); or
white-box attackers who perfectly know the target ML model
\(\mathcal {M}\), such as its configuration, its training data
\(\mathcal {D}\), or the feature importance (e.g., References [
9,
40,
63]). The only papers considering attackers that are closer to our threat model are References [
59,
72] and Reference [
9]. However, the ML-PWD considered in Reference [
9] is specific for
images, which are tough to implement (cf. Section
7.3) and also implicitly resembles an ML system for computer vision—a task well-investigated in adversarial ML literature [
24]. In contrast, the ML-PWD considered in References [
59] and [
72] is similar to ours, but the adversarial samples are randomly created in the feature space, hence requiring an attacker with write-access to the internal ML-PWD workflow. Such an assumption is not unrealistic, but very unlikely in the context of phishing (cf. Section
4.3).
10 Conclusions
We aim to provide a constructive step toward developing ML systems that are secure against adversarial attacks.
Specifically, we focus on the detection of phishing websites, which represent a widespread menace to information systems. Such context entails attackers that actively try to evade “static” detection mechanisms via crafty, but ultimately simple tactics. Machine learning is a reliable tool to catch such phishers, but ML is also prone to evasion. However, realizing the evasion attempts considered by most past work requires a huge resource investment—which contradicts the very nature of phishing. To provide valuable research for ML security, the emphasis should be on attacks that are more likely to occur in the wild. We set this goal as our primary objective.
After dissecting the architecture of ML-PWD, we propose an original interpretation of attacks against ML systems by formalizing the evasion-space of adversarial perturbations. We then carry out a large evaluation of evasion attacks exploiting diverse “spaces,” focusing on those requiring less resources to be staged in reality.