In the Web, there exists an on-going ranking competition: authors of some Web pages may manipulate their documents so as to have them ranked high for many queries [
23]. While the traditional retrieval is performed on a relatively static corpus snapshot, nowadays the Web environment becomes competitive [
41]. Since more and more NRMs are deployed into the real-world applications, the competitive effect needs to be considered for designing NRMs to better fit practical search scenarios. However, there has been little attention paid to consider this effectiveness on NRMs. So we make the first attempt to propose the adversarial attack on NRMs to simulate this real-world competitive search [
41].
In the follows, we will introduce the WSRA task against NRMs, and then describe different adversarial attack settings for the WSRA task.
3.1 Task Description
Typically, given a query \(q\) and a set of document candidates \(\mathcal {D} = \lbrace d_1, d_2, \ldots , d_N \rbrace\) selected from a document collection \(\mathcal {C}\) (\(\mathcal {D} \subseteq \mathcal {C}\)), a ranking model \(f\) aims at predicting the relevance score \(\lbrace f(q,d_n)|n=1,2,\ldots ,N\rbrace\) between every pair of query and candidate document for ranking the whole candidate set. For example, the ranking model outputs the ranked list \(L = [d_N, d_{N-1},\ldots , d_1]\) if it determines \(f(q,d_N) \gt f(q,d_{N-1}) \cdots \gt f(q,d_1)\).
Based on these, the WSRA task aims at fooling the NRMs to promote a target document in rankings by replacing important words in its text with their synonyms in a semantic-preserving way. In particular, we assume that the attacker is inclined to select \(\mathcal {D}\) from the top ranked documents, as the ranked lists returned to the clients are usually “truncated” (i.e., only the partial top-ranked documents will be shown).
In fact, to promote a target document in ranking, there exist multiple ways to design the imperceptible perturbation to the document, e.g., (1) character-level modifications; (2) deleting, adding, or swapping words, and (3) word substitution using semantically similar words. The first two ways are likely to break the grammaticality and naturality of the original input document, and thus can be easily detected by spell or grammar checker [
70]. In contrast, the third way substitutes words with semantically similar words, which can preserve semantic consistency and language fluency to the most considerable extent and is often indistinguishable from legitimate ones for human observers [
18]. Therefore, such word substitutions are a fundamental stepping stone towards identifying the vulnerability of ranking models and helping improve the robustness, which is the focus of this work. That is, the WSRA task aims at promoting a target document in rankings by replacing important words in its text with their synonyms.
In this article, the imperceptibility is reflected in two aspects. Firstly, the adversarial document should be semantic similar to the original document. Secondly, as a feature of adversarial attacks in IR, the adversarial document should easily escape the spam detection. As a verification, we also asked human judges to qualitatively evaluate the imperceptibility.
Formally, given an original target document
\(d\), the goal of an attack is to generate a valid adversarial example
\(d^{adv}\) in the vicinity of
\(d\) that is ranked higher by NRMs. Specifically,
\(d^{adv}\) is crafted to conform to the following requirements, i.e.,
where the adversarial example
\(d^{adv}\) can be regarded as
\(d + p\), and
\(p\) denotes the perturbation to
\(d\). Rank
\(_{L}(q, d)\) and Rank
\(_{L}(q, d^{adv})\) denote the position of the original
\(d\) and its adversarial example
\(d^{adv}\) in the ranked list
\(L\) with respect to the query
\(q\), respectively. A smaller rank position value represents a higher ranking.
\(\operatorname{Sim}\) refers to the similarity function between the original
\(d\) and its adversarial example
\(d^{adv}\), and
\(\epsilon\) is the minimum similarity. In the field of natural language, the universal sentence encoder [USE,
8] is often leveraged as the similarity function
\(\operatorname{Sim}\). USE first maps the two inputs into vector using Transformer encoder, and then computes their cosine similarity as the semantic similarity [
36,
46,
48].
Note we can find clear differences between the WSRA task and adversarial attacks in image retrieval and text classification: (1) The WSRA task needs to ensure that the perturbed document is semantically consistent with the original document by imposing a semantic similarity constraint, while the attack against image retrieval makes the pixel-level perturbations bounded in the budget. In essence, continuous image data is tolerant of perturbations to some extent while discrete text data is not [
22]; and (2) The WSRA task needs to promote the rank positions in a partial retrieved list, instead of misclassifying the single adversarial sample as in text classification. In this way, existing adversarial attacks against text classifiers for misclassification are incompatible with text ranking models, and we need to thoroughly study the WSRA task to promote the rank positions in a partial retrieved list.
Specifically, in this work, we choose the fine-tuned BERT model on downstream search tasks for adversarial ranking attack, due to the following: (1) the pre-trained language model BERT has shown good superiority on many text ranking problems [
25,
38,
50,
54] in both academia and industry in recent years; and (2) previous studies have shown that it is challenging to adversarially attack a fine-tuned BERT on downstream tasks due to its strong performance [
36].