Rethinking security: the resilience of shallow ML models

Teixeira, Rafael; Antunes, Mário; Barraca, João Paulo; Gomes, Diogo; Aguiar, Rui L.

doi:10.1007/s41060-024-00655-1

Rethinking security: the resilience of shallow ML models

Regular Paper
Open access
Published: 18 October 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

Rethinking security: the resilience of shallow ML models

Download PDF

Rafael Teixeira¹,
Mário Antunes^1,2,
João Paulo Barraca^1,2,
Diogo Gomes^1,2 &
…
Rui L. Aguiar^1,2

410 Accesses
Explore all metrics

Abstract

The current growth of machine learning (ML) enabled the commercialization of several applications, such as data analytics, autonomous systems, and security diagnostics. These models are becoming pervasive in most systems and are deployed into every possible domain. Hand in hand with this growth are security and privacy issues. Although such issues are being actively researched, there is an evident fragmentation in the analysis and definition of the ML models’ resilience. This work explores the resilience of shallow ML models to a relevant attack of data poisoning, as poisoning data attacks pose serious threats, compromising ML model integrity and performance. Our study aimed to uncover the strengths of shallow ML models when facing adversarial manipulation. Evaluations were performed in a CAPTCHA scenario using the well-known MINIST dataset. Results indicate remarkable resilience, maintaining accuracy and generalization despite malicious inputs. Understanding the mechanisms enabling resilience can aid in fortifying future ML systems’ security. Further research is needed to explore limits and develop effective countermeasures against sophisticated poisoning attacks.

An Approach to Measure the Effectiveness of the MITRE ATLAS Framework in Safeguarding Machine Learning Systems Against Data Poisoning Attack

Threats on Machine Learning Technique by Data Poisoning Attack: A Survey

Too Big to FAIL: What You Need to Know Before Attacking a Machine Learning System

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Historically, there was a hypothesis that a computer could help a user predict something or decide the next step. The recent advances in machine learning (ML) and the growth in computational capacity transformed the technology landscape, making the previously mentioned hypothesis possible soon [1].

The growth of ML models goes hand in hand with the complexity of the task and the nature of the data itself, requiring large volumes of data to train most models. Acquiring the necessary data can be cumbersome, especially when considering supervised learning tasks, as each data instance must be properly labeled. In some well-known tasks, such as computer vision or natural language processing (NLP), well-known curated datasets can be used as a starting point. However, for most tasks, there are no publicly available datasets.

One popular approach used to collect and label data is crowdsourcing, where a group of users (usually called workers) are tasked with a job, which they complete for a reward [2]. A platform like Amazon Mechanical Turk^{Footnote 1} usually gathers the workers, and the job requester pays the platform for the services performed. Therefore, these platforms allow data gathering and labeling to be completed relatively quickly and at a reasonable cost.

Nonetheless, crowdsourcing is not always the solution, as some tasks require expert knowledge to provide correct labels to the samples (for example, tumor classification in radiography). Furthermore, this method has become a frequent attack environment in ML, where attackers use them to insert data samples with incorrect features or labels or even to trigger remote code execution [3].

As ML models are deployed in commercial and health-based services, authentication systems, root cause analysis, or threat actor detection, the data used to train and infer becomes more sensitive and may need to be handled with care. Also, privacy is paramount when dealing with user data, especially related to its health. Due to this, we have a broad and pressing call to advance the science of security and privacy in ML [4] so that original data (private) cannot be recovered from models.

Several researchers have studied the vulnerabilities (and possible mitigations [5]) present in a large variety of ML models and data collection methods [6]. However, this research is fragmented across several research communities, and there is no unified framework for security evaluation within ML models. Furthermore, although most studies present several possible exploits for ML models, it is not clear how resilient the models are or how effective such methods can be.

This work explored shallow ML model resilience to a data poisoning attack in data collected from a crowdsourcing platform. The scenario considered the collection of labels using a typical Completely automated public Turing test to tell Computers and Humans Apart (CAPTCHA) that works as a crowdsourcing platform. The number of users answering CAPTCHAs is always 20, with the ratio between benign and malicious users being a parameter under study. The malicious user can only change the label attributed to a given sample, and it is assumed that the attacker knows which examples are known and which are unknown. Correctly answering the known part while providing wrongful labels for the other. Since the attack focused on providing wrong labels for the training samples, only supervised learning models were considered. The results can be extrapolated to a higher number of users as each user answered the same number of captcha challenges. Consequently, the number of users considered will only impact the granularity of the results. The models are trained in various stages of the dataset collection process to understand how their performance evolves as the number of contaminated samples increases.

During our analysis, we focused on evaluating an overwhelmingly beneficial scenario for the attackers: one in which each poisoned data piece is added to the dataset without validation, allowing for a considerably fast and broad attack.

Even in that scenario, at least \(35\%\) of the population needed to be malicious in order to disrupt the prediction models’ performance.

When considering traditional mitigation strategies (where a data piece is only added to the dataset when there is a consensus), most algorithms improve their resilience to a population exceeding \(50\%\) malicious entities.

The main contributions of the work are as follows:

I)
it provides a clear view of the resilience of ML models (shallow) in overwhelmingly disadvantageous scenarios,
II)
it has developed a reliable and reproducible method to properly evaluate the resilience of ML models in data poisoning attacks,
III)
it also discusses the impact of traditional mitigation strategies on these types of attacks and evaluates their effectiveness.

The remainder of the paper is organized as follows. Section 2 provides a brief overview of ML and its exploits with the necessary concepts to understand the remaining document. The architecture of our scenarios is given in Sect. 3. Our analysis of the attacks’ feasibility and discussion of the results is provided in Sects. 4 and 5, respectively. Finally, in Sect. 6, we present the conclusions of our overview and analysis.

2 Background and related works

With the rising popularity of ML solutions came the necessity to understand their vulnerability to the actions of malicious actors. This exposure required not only an understanding of the ML models (see Sect. 2.1) but also of the attacker profile, the domain, the target, and the techniques, tools, and processes used to conduct the attack itself.

2.1 Machine learning

ML is a branch of artificial intelligence (AI), by itself a broad research area. ML methods focus on prediction based on known properties learned from the training data.

Usual learning methods can be organized into a taxonomy based on the type of learning and output (see Fig. 1):

Supervised learning: These models focus on learning a function that maps the inputs to one of the possible outputs. Examples of this category are support vector machines (SVMs) [7], Decision trees (DTs) [8], and logistic regression (LR) [9].
Unsupervised learning: Unsupervised learning focuses on finding hidden structures in unlabeled data. It can be clustering or blind signal separation; examples of this category are K-means [10] and principal component analysis (PCA) [11].
Semi-supervised learning: It is similar to supervised learning but combines both labeled and unlabeled data. The label (output) is either present or missing in these data samples. These methods use the labeled data to predict the labels of unlabeled data and, with the complete dataset, learn a function to map the input into one of the possible outputs. An example of this category is transductive SVM [12].
Reinforcement learning: These methods consider the learning process as a continuous task. When given new evidence, the learning model is updated. An example of this category is Q-learning [13].

Given this work’s nature, we will focus mainly on supervised classification algorithms. In the following paragraphs, we will describe the models selected for this work (more details can be found in Sect. 3).

Eight models were used during the experiment: LR, SVM, K-nearest neighbors (k-NN), DT, artificial neural network (ANN), random forest (RF), voting classifier (soft), and voting classifier (hard). These models can be divided into individual (first five models) and ensemble models (the remaining). The individual models use a single model to obtain a label, while the ensembles combine the output from multiple models to achieve the outcome.

The simplest model is the LR. It estimates the probability of an event happening given a set of features. The output is between zero and one, indicating certainty of the event not happening and happening, respectively [14]. After optimizing the weights, predictions are considered zero if the output is below 0.5 and one otherwise.

The SVM is similar to LR but maximizes margins between the decision boundary and examples [15]. This feature makes SVM a better classifier than LR, with improved resilience to outliers and better-fitted curves. It can handle nonlinear separation using kernels, such as the Radial Basis Function (RBF) kernel.

The k-NN algorithm classifies examples based on the closest k terms using a voting system [15]. The class with the most votes becomes the label for the new data point. The performance depends on the number of neighbors and their influence on the decision.

The DT formulates class prediction as a tree structure where nodes verify features [14]. It’s easily explainable and versatile, with options to limit depth for a trade-off between accuracy and prediction rate. Factors like the splitting criterion and considered features influence the tree’s quality.

The classical ANN mimics the human brain using interconnected layers and nodes [14]. Training involves forward and backward propagation of signals to adjust connection weights. After training, new data points are propagated forward to predict classes.

Ensemble classifiers aim to create robust and well-rounded classifiers by combining models. In the experiments, two well-known bagging classifiers, the RF and voting classifier, were used.

The RF creates multiple DTs trained on different subsets of the data to reduce overfitting [15]. The DTs’ results are combined through a weighted vote based on probability estimates, using the class with the highest mean probability as the final prediction.

The voting classifier uses any desired models without subsets. It combines their outputs through soft or hard voting. Soft voting allows finer granularity and reduces the impact of low-certainty outputs but requires careful training to avoid bias.

The previously mentioned models require a considerable amount of labeled data to train. Acquiring ever-growing datasets is a tedious and complex task. Methods such as Crowdsourcing and Collaborative Sensing [16,17,18,19] distribute the acquisition task through several entities (mostly volunteers).

There are several examples of such data acquisition methods. One of the most well-known is User Preferences, where a user votes on products and services. Two large services employ this method: Netflix and Amazon. In these services, the users classify books, movies, and other products, allowing the service to learn how to predict which products would interest each user. Another well-known service is the CAPTCHA (see more details in Sect. 2.3), where a user has to input some data to advance on a web page.

Crowdsourcing and Collaborative Sensing allow offloading to third parties the required data acquisition to enhance a well-curated internal dataset and train ML models with increasing samplings, improving robustness and accuracy. However, these acquisition methods also open the door for third parties to poison the data, decrease the subsequent ML models, or influence the outputs in their favor [20]. As previously mentioned, this work’s primary focus is to explore the resilience of the ML models to data poisoning attacks through Crowdsourcing and Collaborative Sensing.

2.2 Data poisoning

Data poisoning or model poisoning attacks [21] involve polluting a machine learning model’s training data. Data poisoning is considered an integrity attack because tampering with the training data impacts the model’s ability to output correct predictions. Unlike backdoor attacks [21], data poisoning aims to degenerate overall model performance, affecting a wider range of models and tasks. It should be noticed that specific attacks can use ML models to conduct remote code execution [3], which we consider to be out of the scope of our work.

While most attacks in the literature focus on a specific model, maximizing their effectiveness, some initial works employed a simpler approach: random or distance-based label switching [22, 23]. These early works are still valuable as we will explore these methods in the following sections. In the next paragraphs we presented relevant and recent attacks.

Some of the early works by Rubinstein et al. [24, 25] focused on anomaly detection using PCA dimensionality reduction. The attacker injects malicious data to launch a denial of service (DoS) attack, showing vulnerability in PCA-based anomaly detection models. They proposed a mitigation method named ANTIDOTE.

Another attack by Biggio et al. [26] used gradient ascent to attack SVM, finding a point that maximally decreases the SVM’s accuracy. However, the attack requires the attacker to control the label associated with the attack vector.

For deep learning (DL) models on graphs, unnoticeable perturbations in the graph structure can reduce graph neural networks (GNN) accuracy. Nettack [27] is a well-known graph convolutional network (GCN) attack, but a low-rank approximation of the graph using top singular components can defend against it [28]. Tensor-based node embeddings, projecting the graph into a low-rank subspace, are also robust to Nettack perturbations.

GNNGUARD [29] is a general algorithm to defend against training-time attacks that perturb the discrete graph structure. It identifies and quantifies the relationship between graph structure and node features, then adjusts edge weights to mitigate the attack’s adverse effects and enable robust neural message propagation in the GNN.

Exploring the problem of crowdsensing, similar to crowdsourcing, [30] proposes a new approach to data poisoning detection, highlighting the difficulties of successfully discovering these attacks when the malicious user employs sophisticated measures. However, the study focuses on a specific deep neural network (DNN) and not the general impact.

Considering the continual learning scenario, where the model does not have access to previously available data, [31] analyzes the impact of a targeted attack focused on changing input features to disrupt classification. Using a simple task-specific attack, the researchers can successfully disrupt model performance by adding noise to the input features used in the new training samples.

Looking at crowdsourcing platforms, [32] analyzes detection and mitigation strategies that remove and reduce the impact malicious works can have during the data collection process. The results show that the proposed attacks can effectively disrupt models, and the defenses meaningfully reduce the attack’s impact.

Focusing on various attacks, [33] analyzed the impact that label-flipping, data modification, data deletion, and sponge poisoning could have on a long short-term memory (LSTM). The results show that a data deletion attack presented the most effective approach to disrupting the considered model. However, every attack affected model performance.

From the perspective of actual systems and outside research, poisoning datasets is a long-known technique used even by human analysts. In standard defense engineering approaches, the baseline for detection and response must always consider a reference dataset.

The dataset can be used for training models that classify application types, malware from potentially unwanted applications (PUAs) or standard applications, or specific techniques. Most commonly, they are used to define a baseline for anomaly detection and overall user and entity behavior analysis (UEBA) mechanisms.

What is essential is that the construction of a baseline, representing a secure system, is paramount for the creation of any effective ML solution for cybersecurity. In the case of continuously adapting models, which is very important for organizations with heterogeneous sets of users, poisoning and other adversarial approaches are practical attacks to bypass defenses [34].

Regarding the impact the poisoning attacks can have on ML models, to the best of the author’s knowledge, the available literature focuses on analyzing the impact data poisoning can have in specific models [6, 33], in Federated Learning scenarios [35,36,37], or how this attacks can be mitigated [30,31,32], leaving out the crucial analysis of the impact this attacks can have in crowdsourcing and collaborative scenarios where the attacker does not know the model being trained.

2.3 CAPTCHA

The term CAPTCHA was coined by Luis Von Ahn in 2003 and was used to characterize problems that could be implemented to tell humans and computers apart [38]. The idea of a CAPTCHA is to present a challenge that humans can quickly solve but current computers cannot. A typical example is identifying which set of images contains a random object.

There are various types of CAPTCHA, and they are grouped according to their kind of challenge. There are five types of CAPTCHAs: Text, Image, Audio, Video, and Puzzle-based [39]. Each has its advantages and disadvantages. Figure 2 presents the two most popular CAPTCHAs, the text-based CAPTCHA shown in Fig. 2a and the image-based CAPTCHA shown in Fig. 2b. Every human can easily recognize the first, even if the person does not speak or understand English, and it is easier to implement. Nevertheless, using DNNs, it can easily be broken in real time [40]. The second presents another trade-off. They are easier to solve but require a large dataset of labeled images.

In some approaches, besides human identification, the solutions use the CAPTCHA framework to label unknown data [41, 42]. Instead of presenting a challenge and validating if the user is or is not human, they provide a challenge where only part of the answer is known and used to identify bots. The response to the unknown portion is utilized as a label. This approach can be applied to the various types of CAPTCHAs implemented, with the two most popular applications being text and image-based CAPTCHAs.

This approach turns the CAPTCHA solutions into crowdsourcing platforms where the workers label data instances in exchange for access to a website. Furthermore, it provides an added security barrier compared to traditional crowdsourcing, as the attacker must answer correctly the known parts of the challenge. Poisoning attackers must distinguish the challenge’s known portions from the unknown; otherwise, the poisoning attempt is detected and discarded.

There are already some techniques to break CAPTCHA challenges [43, 44]. In applied terms, more than just bypassing the CAPTCHA, our primary concern is understanding the consequences of manipulating the dataset being collected for future challenges.

3 Adversarial demonstration architecture

Considering the background presented, although several attacks have been proposed, and the presented attacks show a high degree of success in disrupting the model, to the best of the author’s knowledge, there is no study that provides a thorough analysis of how data poisoning attacks can impact a model’s performance. However, to the best of the authors’ knowledge, there is no such study. This section describes the scenario developed to answer that data poisoning question. In this scenario, and without loss of generality, a text-based CAPTCHA is used to create a dataset for digit recognition. The following subsections will provide details regarding the implemented CAPTCHA, how the attacker infiltrates the collection system, and the objectives of the system.

3.1 System model

The demonstration system is shown in Fig. 3. It generates a numeric CAPTCHA (i.e., six digits) from the training data to validate if the user is a human. The CAPTCHA has known and unknown digits. The known digits are the control to validate that a user is human. The unknown digits are crowdsourced data susceptible to being exploited for an attack. This system is very similar to the classical reCAPTCHA [41], where one word was known, and the other was unknown.

The unknown part is used to gather labels to train the classifier. The trained model can be tested using a separate dataset just for testing, arriving at a performance assessment metric. Although the system is designed to assess the resiliency against adversarial influence through data poisoning in unknown digits, this design could be used in production as a fully functioning regular CAPTCHA.

From a regular CAPTCHA perspective, the system allows resorting to voting systems to improve resilience against poisoning attacks, thus having better security. That can be achieved by sending the same unknown part of the CAPTCHA to N users, with N being an odd number, and only accepting a classification when one label has at least half the votes. Otherwise, it is maintained in the unknown pool and sent again to classification.

3.2 Threat model

Defining the threat model is essential for the adversarial demonstration architecture. Figure 4 shows the threat model for our system. It contemplates three domains: the attacker, the online system, and the offsite performance evaluation. The attacker can only interact directly with the online system. The offsite domain assesses the model performance in the online system independently of the attacker’s domain. It is assumed the attacker does not explore any vulnerability other than those inherent to the CAPTCHA (e.g., a zero-day in the network stack or any attack tailored to a specific model is out of the scope).

In this case, the CAPTCHA has six digits. The first three digits have known labels and are used for the CAPTCHA process. The last three are unknown, and the answers provided will be the labels in the dataset. Since an ML model can perform number recognition with high accuracy and the system does not randomize the known and unknown parts, an attacker can easily surpass the CAPTCHA validation and poison the dataset. The lack of additional security procedures is purposely done to provide the best-case scenario for the attacker and to understand the damage that can be performed when an attacker can interfere with label collection freely.

From the presented CAPTCHA challenge, it is quickly understood that the dataset’s task is digit identification. This task means that for each example, there are ten possible labels, and the features are the pixels of an image of a number. Since we wanted to understand the performance degradation caused by label poisoning, we considered the MNIST dataset as the repository for the images presented in the CAPTCHA challenges. MNIST was used as it is a well-known dataset where the evaluated models perform considerably well when there is no poisoning attack. The dataset contains 60,000 training images and 10,000 testing images. From the training data, 30% (18,000 examples) of it was kept as images with known labels. The remaining 70% (42,000) were considered images with unknown labels that the users will classify. The test dataset was kept unchanged so that we could understand the success of the poisoning attack.

For more representative results of the ML models breakdown, powerful attackers capable of flawless reading of the CAPTCHA and reliable interpretation of which digits are known and which are crowdsourced were contemplated.

Such a powerful attack within this formulation is unrealistic unless the attacker establishes an oracle in the system; however, considering this worst-case scenario is useful to understand how much the performance of the ML models can deteriorate. This choice is the most favorable scenario for the attacker to compromise the ML system, allowing to ascertain the risks of operating such ML-powered crowdsourcing systems.

3.3 Bot implementation

Since the attack considers using malicious bots that blend with human users, besides implementing malicious bots, it was also needed to implement a benign bot simulating a human. This bot will always respond correctly to the known and unknown parts. The malicious bots will always answer the known part correctly but will answer the unknown part incorrectly in different ways. The three types of malicious bots are the malicious uncoordinated bot, the malicious coordinated bot, and the malicious coordinated and intelligent bot.

The malicious and uncoordinated bot will answer any random value except the correct answer for each number of the unknown part. The malicious coordinated bot will act coordinated by always responding in a switch behavior, switching the correct value with the next integer. For example, if the correct number is two, it will answer three; if the number is nine, it will answer zero. This bot demonstrates coordination without intelligence, meaning there is no particular reason for choosing the output presented. The intent is to give the same answer collectively. Finally, the maliciously coordinated and intelligent bot will always answer with the most similar incorrect answer (flip behavior). For example, if the presented number is six, it answers nine, and vice versa. This bot shows the impact of an attack where the attacker has some knowledge of how to disrupt the model training intelligently. The complete set of flips considered is presented in Table 1. The implemented malicious bots do not represent a sophisticated attack, as most are impractical in the real world and require some knowledge concerning the model that will be trained. Instead, we preferred implementing a realistic attack on an unsafe system and analyzing its consequences.

Table 1 Flip behavior implemented

Full size table

3.4 Methods

The ML methods’ utilized are all shallow models. The lack of DNN in the ML methods used comes from two facts. The first is that shallow models can easily solve the task, so in a real-world environment, these models would be preferable to the larger models with a more expensive training and inference process. The second is that these are considered simpler models, which intuitively would make them more susceptible to data poisoning attacks.

The shallow models are out-of-the-box implementations from Scikit-learn,^{Footnote 2} a popular machine-learning library. So, unless stated otherwise, the models use the default hyperparameters.

One detail must be addressed for the LR and SVM, as they only work for a two-class problem. Since digit identification is a problem with ten classes, they cannot be directly applied. This problem is solved in Scikit-learn through the one-vs-all strategy, where a model is trained for every class, identifying one class as positive and the remainder as negative. After training every model to predict the final class, the strategy chooses the mode that outputs the value closest to one.

Regarding the k-NN, since the value of k significantly impacts the model’s results, we considered various values for it \( k \in \{1, 3, 5, 7, 9\}\). In the remainder of the paper, the results considered \(k = 3\) as it presented good performance values at a reasonable inference time.

In the implemented classical ANN, some hyperparameters must be addressed. The ANN comprises three fully connected layers: the input, the hidden, and the output. The number of nodes for the input layer is equal to the number of features, in our case, 768 (the number of pixels), the hidden layer contains 100 nodes (the default implemented by Scikit-learn), and the output layer has 10 nodes (the number of classes). The default values were also used for the activation functions, meaning that for the hidden layer, it was used relu and for the output softmax.

Finally, the models used in the voting classifier are LR, k-NN (\(k = 3\)), DT, and ANN. For the voting scheme, both options were tested to understand which presents better resilience to the poisoning attack.

3.5 Objectives

This system and threat model allows us to assess a dataset poisoning attack’s impact on an ML model performance when the dataset is collected from a real-world CAPTCHA dataset collection system deployment.

To that extent, the objectives are:

Understand how much of the dataset must be compromised to reduce the model performance significantly (below 70% Mathews Correlation Coefficient (MCC), more details on the breakpoint in Sect. 4.1).
Estimate how many compromised users/requests the attacker must perform to achieve the breakpoint.
Evaluate the impact of security measures (voting system).
Analyze the impact of different types of general attacks on the model performance.

4 Evaluation

Many experiments could be performed after describing the system, threat model, and the proposed scenario in the previous section. Therefore, the first subsection describes the experimental approach in this evaluation. To better understand how resilient the shallow ML methods are to a poisoning attack and the feasibility of the attacks, we considered four independent variables that could affect model performance. Then, we present the results without any mitigations, followed by the results with one mitigation.

The code and experiment results are publicly available on GitHub.^{Footnote 3}

4.1 The experiment

In the experiment performed, we considered that four important independent variables should be considered to understand how much impact the malicious bots could have. The first is the number of malicious users deployed, the second is the type of attack performed, the third is the number of votes required to label an instance, and the fourth is how long it took for the bots to break the model. The idea is that the larger the number of bots, the less time it should take to break the model, as data is poisoned faster.

To simplify the data collection process, we always considered that the same 20 users requested the system for a given run, and each user answered the same amount of challenges. This ensures that using 20 or 100 users, if the percentage of malicious users is the same, the same percentage of the dataset will be poisoned in the considered thresholds. Furthermore, as we intended to evaluate each type of attack individually, only a kind of malicious bot was deployed each time. From the start of data collection, we had 20-N benign and N malicious users, with N ranging from 0 to 20. The case where N = 0 is considered the baseline and every example in the dataset is correctly labeled. The remaining analyze the effect of the first and second independent variables.

The analysis of the fourth independent variable is achieved through various snapshots of the dataset during the collection process. Every time 10% of the unknown examples (4200) are classified, a new dataset is created considering all previously collected data and the new 10%. Then, for each of these datasets, the eight shallow ML models are trained and their performance evaluated. The decrease in performance between the baseline and the results with the snapshot dataset shows the impact the malicious bots had until that point.

Finally, the voting process analysis is built on top of the remaining analysis, and the system will use a voting system with three votes.

It is essential to remember that the models are always trained with the known labels used in the CAPTCHA (18,000 examples). This number means that even when there are 20 malicious users, and the dataset is fully collected, 30% of the dataset is still correctly classified.

The MCC [45] is used in ML to measure the quality of multiclass classification. MCC was used instead of F1-Score or other classification metrics since it considers the complete confusion matrix to create the metric and is generally regarded as a balanced measure. Furthermore, it uses more factors when calculating the quality of the results, being more resilient to unbalanced datasets. The MCC is essentially a correlation coefficient value between \(-1\) and \(+1\). A coefficient of \(+1\) represents a perfect prediction, 0 represents an average random prediction, and \(-1\) represents an inverse prediction.

The breaking point is the moment when a given model always performs below a given threshold, either when the amount of data collected or the number of malicious users increases. The threshold considered was an MCC of 0.7 since, as a rule of thumb, in most real-world applications, an MCC above 0.7 indicates a robust classifier.

As mentioned in Sect. 1, we explored the most advantageous scenarios for attackers. This is equivalent to the worst-case scenario for the ML models, where no mitigation methods are used to validate the crowd-sourced acquired labels. This allows for rapid and widespread contamination of the dataset. As it stands, this study represents a lower bound on the resilience of shallow models.

Table 2 Breaking point for the models without mitigation system

Full size table

4.2 Results without a mitigation system

Analyzing the best-case scenario for the attacker, we can state that different attacks have varied impacts on the methods. Table 2 presents the breaking point of the models for each type of attack. The LR, SVM, and voting classifier (hard) have similar breaking points in the random and flip attack and break earlier in the switch attack. The ANN, k-NN, DT, RF, and voting classifier (soft) take longer to break using the random attack than the coordinated ones, with the RF never dropping below 0.7 MCC in the random attack.

These results are closely related to the capability of the models to deal with outliers. Since simple linear models like LR and SVM try to fit a decision boundary between the data points, even a few outliers can considerably impact the final decision boundary. This boundary shift, in turn, is reflected in the early decrease in performance. Although not a linear model, the DT suffers from a similar problem. Considering that it tries to find the best splitting points for each feature, having a few outliers can be enough to change the splitting decisions and degrade performance. On the other hand, ensemble classifiers like the RF and voting classifiers are usually more resilient to outliers, given that the outliers must affect every model that votes in the final decision. The k-NN works similarly and, therefore, follows a similar rationale. If we consider the neighboring points as votes, then the k-NN becomes a voting system, and as the number of votes increases, so does the model’s resilience to outliers.

Figures 5 and 6 present the best and worst-performing models’ results, respectively. The behavior seen in LR is very similar to that seen in the SVM and DT, as they show a linear decrease in performance as the dataset becomes more contaminated. On the other hand, the behavior seen in the RF, where the model presents a steady performance followed by a sudden drop, is seen in the other voting classifiers and the ANN. The k-NN behavior depends on the number of neighbors used, as the lower number of neighbors presents a behavior closer to the LR and the higher number a behavior closer to the RF. The k-NN with three neighbors presents a middle ground between the two as it does not drop performance linearly nor presents the same level of resilience.

Table 3 Breaking point for the models with a mitigation system

Full size table

As mentioned, these results are related to the different resilience to outliers. Since the ensemble models present a high resilience to outliers, only when a significant part of the data is poisoned does the model break. In the case of RF, the performance never drops below the threshold in the random attack, as 30% of correctly labeled data is enough for the model to distinguish between outliers and correct labels. However, when considering coordinated attacks, at a certain point, the amount of consistent incorrect labels becomes greater than the number of correct labels, and the model starts to consider the correct labels as outliers. At this point, the poisoning attack is successful, and the model’s performance decreases drastically, even dropping below the performance of the LR. On the other hand, the LR presents a steady decrease in performance, but it always regards the correct labels, hence the higher performance in the worst-case scenario.

4.3 Results with mitigation system

Table 3 shows the mitigation system results are highly model-dependent. In the case of the k-NN, RF, and ANN, the voting system does not affect the breaking point, while in the LR, it can either help, as is the case of the random and flip attacks or even decrease performance, as is the case in the switch attack. This result is easily understood because the voting system only reduces the amount of misclassified labels. Since the ensemble methods handled fewer outliers nicely, the voting system does not provide any benefit. On the other hand, the linear models affected by the outliers see a significant increase in resilience.

This behavior can be seen in Figs. 7 and 8, where the LR performance does not decrease linearly, and the RF does not see a meaningful difference. Comparing the approaches carefully, one can see that the random attack to the LR with mitigation presents a bigger slope when reaching the worst-case scenario than the no mitigation one. This result is explained by the fact that once most users are malicious, the voting system works in their favor, as the correct classifications are the minority and will not be accepted. Similar to what happens in the ensemble methods, when most data is poisoned, and the correct labels are seen as incorrect. This gives the LR model a shape that is closer to an ensemble method.

Still considering the LR model, the benefits of the voting system are more easily noticed in the random attack as sometimes the malicious users will work against each other, providing two different labels and not achieving consensus. Giving the system another chance to select more benign users than malicious ones and correctly identify the number. Since in both coordinated attacks, the malicious users vote consistently, the curved effect noticed around the 5 to 15 malicious users disappears, and the model presents a linear slope.

5 Discussion

While the experimentation shows that compromising the ML system with the three attack methods is possible, the results require further reflection on the practicality of such attacks. Even without any mitigation, the best attack strategy tested required at least 35% of compromised users and poisoning at least 10% of the dataset. These numbers are very selective of the target system or the attacker type that may succeed. If the target system is less popular or without that many users, achieving \(35\%\) compromised users is feasible. This may raise concerns for isolated deployments in organizations, which for the purpose of enhanced security, use their own hosted solutions. In reality, systems will be more vulnerable to attacks due to the low scale of the system deployment. However, if it is a popular platform with tens of thousands of users (if not more), compromising that platform requires extensive botnet capabilities, coordination across bots, and sophistication to hide the sudden influx of bots performing adversarial actions. All while considering a threat model heavily stacked toward the attacker succeeding. We consider that this is not out of reach, as the recent statistics regarding botnets indicate the existence of more than 8000 botnets [46]. Historically, several botnets have or have had several millions of devices, which present a formidable capability toward practical attacks on ML through poisoning.

Even without any mitigation, some ML models perform vastly better than others. Ensemble classifiers have shown impressive resilience against the different attack types. In all experimented cases, the attacker requires more than \(50\%\) of compromised users. That means a sudden and very noticeable increase of users on the platform, with a vastly significant majority of new users (i.e., the bots) performing adversarial actions against the system. Such sudden influx would likely trigger different anomaly detection systems, require scaling up the resources, and initiate human verification of what was happening. However, the rise of malicious actors supported by nation states, which are denoted to have high motivation and operate over long timescales, potentially also affecting other parts of the supply chain, even \(50\%\) may be at reach.

Therefore, we consider that data-poisoning ML models in production may be challenging and may be more challenging than what may come across when reading about the poisoning techniques. However, when no mitigation systems are used, these attacks are under the reach of advanced actors operating botnets or over increased time scales.

Since the simple mitigation system improved the resilience of various models, it is also important to further analyze the effectiveness of mitigation strategies and how they can meaningfully increase the attacker’s effort.

6 Conclusion

The article explored how different ML models behave when faced with data poisoning attacks. Our experimental setup created a representative scenario of a production system with ML susceptible to a data poisoning attack while still favoring the attacker’s ability to complete that data poisoning attack successfully. The experiment considered different poisoning techniques (random, switch, and flip) to understand better how each technique affects the ML model and further increases the attacker’s success probability (to determine the breakage point).

The experimental results were telling. Without any mitigation, the more straightforward ML approach (LR) required at least 35% of malicious users to perform the poisoning successfully. In the simpler models without mitigation, the breaking point happens after 10% of the dataset is contaminated. These numbers are challenging to achieve in larger-scale systems with many users but may be under the reach of botnet operators and advanced threat actors. When simple mitigations were introduced (i.e., a voting system), the ML models became even more resilient, increasing the need for higher number of coordinated bots.

Therefore, while attacking an ML system in the wild is unfeasible in most scenarios, especially when relying on large-scale public infrastructures employing ML, the security of these systems should take into consideration additional factors. Also, additional layers of observability and model checking should be in place so that malicious actions are detected.

These results were gathered from a CAPTCHA-based scenario; however, the conclusions should be applicable to any scenario/platform that uses crowdsourcing to expand a dataset. As mentioned, data poisoning is a viable attack, which is impractical to small and medium-scale actors if the population is large enough. This holds true even without any mitigation methods and when considering only shallow learning models. Care should be taken if the system may be targeted by advanced persistent actors.

In future work, we intend to expand the proposed system to include poisoning attacks on the input features, which will allow us to evaluate supervised and unsupervised methods. We will also extend the analysis of mitigation strategies, especially state-of-the-art ones, as they will help us understand the impact in the best-case scenario for the model. Finally, we will also increase the number of datasets used, as it will help understand the resilience to attacks of the models when trained on different data distributions and data types.

Data Availability

All the outputs from the work are publicly available on the GitHub whose link is provided in the article.

Notes

References

Singh,V.K., Gupta, A.K.: From artificial to collective intelligence: Perspectives and implications, In: 2009 5th International Symposium on Applied Computational Intelligence and Informatics, pp. 545–550, (2009)
Koita, T., Suzuki, S.: Crowdsourcing and its application for traffic survey work. In: 2019 IEEE 4th International Conference on Big Data Analytics (ICBDA), pp. 375–378, (2019)
Weaponizing ML models with ransomware. https://hiddenlayer.com/research/weaponizing-machine-learning-models-with-ransomware/, 2022. Accessed: 29-04-2024
C. on Technology National Science, Council, T., Press, P.H.: Preparing for the Future of Artificial Intelligence. North Charleston, SC, USA: CreateSpace Independent Publishing Platform, (2016)
Cunha, V., Corujo, D., Barraca, J., Aguiar, R.: TOTP Moving Target Defense for sensitive network services. Pervasive Mobile Comput 74, 101412 (2021). https://doi.org/10.1016/j.pmcj.2021.101412
Article Google Scholar
Fan, J., Yan, Q., Li, M., Qu, G., Xiao, Y.: A survey on data poisoning attacks and defenses. In 2022 7th IEEE International Conference on Data Science in Cyberspace (DSC), pp. 48–55, (2022)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Article Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Article Google Scholar
Nelder, J.A., Wedderburn, R.W.M.: Generalized linear models. J. R. Stat. Soc. Ser. A (General) 135(3), 370–384 (1972)
Article Google Scholar
Lloyd, S.: Least squares quantization in pcm. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet Google Scholar
Pearson, K.: On lines and planes of closest fit to systems of points in space. Phil. Mag. 2(6), 559–572 (1901)
Article Google Scholar
Bennett, K.P., Demiriz, A.: Semi-supervised support vector machines. In: Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems II, pp. 368–374, MIT Press, (1999)
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Article Google Scholar
Ray, S.: A quick review of machine learning algorithms. In: 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), pp. 35–39, (2019)
Singh, A., Thakur, N., Sharma, A.: A review of supervised machine learning algorithms. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1310–1315, (2016)
Garcia-Molina, H., Joglekar, M., Marcus, A., Parameswaran, A., Verroios, V.: Challenges in data crowdsourcing. IEEE Trans. Knowl. Data Eng. 28(4), 901–911 (2016)
Article Google Scholar
Roh, Y., Heo, G., Whang, S.E.: A survey on data collection for machine learning: a big data - ai integration perspective. IEEE Trans. Knowl. Data Eng. 33(4), 1328–1347 (2021)
Article Google Scholar
He, S., Shi, K., Liu, C., Guo, B., Chen, J., Shi, Z.: Collaborative sensing in internet of things: a comprehensive survey. IEEE Commun. Surv. Tutor. 24(3), 1435–1474 (2022)
Article Google Scholar
Liang, Y., Wang, X., Yu, Z., Guo, B., Zheng, X., Samtani, S.: Energy-efficient collaborative sensing: learning the latent correlations of heterogeneous sensors. ACM Trans. Sen. Netw. 17(3), 1–28 (2021)
Article Google Scholar
Tahmasebian, F., Xiong, L., Sotoodeh, M., Sunderam, V.: Crowdsourcing under data poisoning attacks: A comparative study. In: Data and Applications Security and Privacy XXXIV (A. Singhal and J. Vaidya, eds.), (Cham), pp. 310–332, Springer International Publishing, (2020)
Goldblum, M., Tsipras, D., Xie, C., Chen, X., Schwarzschild, A., Song, D., Mądry, A., Li, B., Goldstein, T.: Dataset security for machine learning: data poisoning, backdoor attacks, and defenses. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1563–1580 (2023)
Article Google Scholar
Tian, Z., Cui, L., Liang, J., Yu, S.: A comprehensive survey on poisoning attacks and countermeasures in machine learning. ACM Comput. Surv. 55, 1–35 (2022)
Article Google Scholar
Aljanabi, M., Omran, A.H., Mijwil, M.M., Abotaleb, M., El-kenawy, E.-S.M., Mohammed, S.Y., Ibrahim, A.: Data poisoning: issues, challenges, and needs. In: 7th IET Smart Cities Symposium (SCS 2023), Institution of Engineering and Technology, (2023)
Rubinstein, B.I., Nelson, B., Huang, L., Joseph, A.D., hon Lau, S., Rao, S., Taft, N., Tygar, J.D.: Antidote: understanding and defending against poisoning of anomaly detectors. In: Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference - IMC’09, pp. 1–14, ACM Press, (2009)
Rubinstein, B.I., Nelson, B., Huang, L., Joseph, A.D., Hon Lau, S., Rao, S., Taft, N., Tygar, J.D.: Stealthy poisoning attacks on PCA-based anomaly detectors. ACM SIGMETRICS Perform. Eval. Rev. 37, 73–74 (2009)
Article Google Scholar
Biggio, B., Nelson, B., Laskov, P.: Poisoning attacks against support vector machines. In: Proceedings of the 29th International Coference on International Conference on Machine Learning, ICML’12, (Madison, WI, USA), p. 1467–1474, Omnipress, (2012)
Zügner, D., Akbarnejad, A., Günnemann, S.: Adversarial attacks on neural networks for graph data. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (New York, NY, USA), pp. 2847–2856, Association for Computing Machinery, (2018)
Entezari, N., Al-Sayouri, S.A., Darvishzadeh, A., Papalexakis, E.E.: All you need is low (rank): Defending against adversarial attacks on graphs. In: Proceedings of the 13th International Conference on Web Search and Data Mining, WSDM ’20, (New York, NY, USA), p. 169–177, Association for Computing Machinery, (2020)
Zhang, X., Zitnik, M.: Gnnguard: Defending graph neural networks against adversarial attacks. In: Proceedings of Neural Information Processing Systems, NeurIPS, pp. 1–13, (2020)
Zhang, H., Li, M.: Multi-round data poisoning attack and defense against truth discovery in crowdsensing systems. In: 2022 23rd IEEE International Conference on Mobile Data Management (MDM), pp. 109–118, (2022)
Han, G., Choi, J., Hong, H.G., Kim, J.: Data poisoning attack aiming the vulnerability of continual learning. In: 2023 IEEE International Conference on Image Processing (ICIP), pp. 1905–1909, (2023)
Zhao, Y., Gong, X., Lin, F., Chen, X.: Data poisoning attacks and defenses in dynamic crowdsourcing with online data quality learning. IEEE Trans. Mob. Comput. 22(5), 2569–2581 (2023)
Article Google Scholar
Vuseghesa, F.K., Messai, M.-L.: Study on poisoning attacks: Application through an iot temperature dataset. In: 2023 IEEE International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), pp. 1–6, (2023)
Rosenberg, I., Shabtai, A., Elovici, Y., Rokach, L.: Adversarial machine learning attacks and defense methods in the cyber security domain. ACM Comput. Surv. 54(5), 1–36 (2022). https://doi.org/10.1145/3453158
Article Google Scholar
Wei, W., Chow, K.-H., Wu, Y., Liu, L.: Demystifying data poisoning attacks in distributed learning as a service. IEEE Trans. Serv. Comput. 17(1), 237–250 (2024)
Article Google Scholar
Shejwalkar, V., Houmansadr, A., Kairouz, P., Ramage, D.: Back to the drawing board: A critical evaluation of poisoning attacks on production federated learning. In: 2022 IEEE Symposium on Security and Privacy (SP), pp. 1354–1371, (2022)
Shi, L., Chen, Z., Shi, Y., Zhao, G., Wei, L., Tao, Y., Gao, Y.: Data poisoning attacks on federated learning by using adversarial samples. In: 2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI), pp. 158–162, (2022)
von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: Captcha: using hard ai problems for security. In: Advances in Cryptology — EUROCRYPT 2003 (E. Biham, ed.), (Berlin, Heidelberg), pp. 294–311, Springer Berlin Heidelberg, (2003)
Challa, Shivani R.K.: Captcha: a systematic review. In: 2020 IEEE International Conference on Advent Trends in Multidisciplinary Research and Innovation (ICATMRI), pp. 1–8, (2020)
Tang, M., Gao, H., Zhang, Y., Liu, Y., Zhang, P., Wang, P.: Research on deep learning techniques in breaking text-based captchas and designing image-based captcha. IEEE Trans. Inf. Forensics Secur. 13(10), 2522–2537 (2018)
Article Google Scholar
von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: recaptcha: Human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008)
Babaei, M., Ghoushchi, M.B.G., Noori, A.: Yapptcha: yet another picture promoted captcha with spam stopping, image labeling and sift accomplishment. In: 2013 21st Iranian Conference on Electrical Engineering (ICEE), pp. 1–8, (2013)
Mittal, S., Kaushik, P., Hashmi, S., Kumar, K.: Robust real time breaking of image captchas using inception v3 model. In: 2018 Eleventh International Conference on Contemporary Computing (IC3), pp. 1–5, (2018)
Wang, D., Moh, M., Moh, T.-S.: Using deep learning to solve google recaptcha v2’s image challenges. In: 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM), pp. 1–5, (2020)
Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. (2020). https://doi.org/10.1186/s12864-019-6413-7
Spamhaus, Spamhaus Botnet Threat Update: Q4 2023. https://www.spamhaus.org/resource-hub/botnet-c-c/botnet-threat-update-q4-2023/, Accessed on 29 April 2024. (2023)

Download references

Acknowledgements

This study was funded by the PRR - Plano de Recuperação e Resiliência and by the NextGenerationEU funds at University of Aveiro, through the scope of the Agenda for Business Innovation “NEXUS: Pacto de Inovação - Transição Verde e Digital para Transportes, Logística e Mobilidade” (\(\hbox {Project n}^{\circ }\) 53 with the application C645112083-00000059).

Funding

Open access funding provided by FCT|FCCN (b-on).

Author information

Authors and Affiliations

Instituto de Telecomunicações, Universidade de Aveiro, Aveiro, 3810-193, Aveiro, Portugal
Rafael Teixeira, Mário Antunes, João Paulo Barraca, Diogo Gomes & Rui L. Aguiar
DETI, Universidade de Aveiro, Aveiro, 3810–193, Aveiro, Portugal
Mário Antunes, João Paulo Barraca, Diogo Gomes & Rui L. Aguiar

Authors

Rafael Teixeira
View author publications
You can also search for this author in PubMed Google Scholar
Mário Antunes
View author publications
You can also search for this author in PubMed Google Scholar
João Paulo Barraca
View author publications
You can also search for this author in PubMed Google Scholar
Diogo Gomes
View author publications
You can also search for this author in PubMed Google Scholar
Rui L. Aguiar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.T. developed and executed the experiments, analyzed data, and wrote and reviewed the document. M.A. conceptualized the work, helped in the development of the experiments, performed data analysis, and wrote/reviewed the document. J.B. conceptualized the work, performed data analysis, and reviewed the document. D.G. conceptualized the work and reviewed the document. R. A. conceptualized the work and reviewed the document.

Corresponding author

Correspondence to Rafael Teixeira.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Voting system results with a higher number of votes

Table 4 Breaking point for the models with a mitigation system—5 votes

Full size table

Table 5 Breaking point for the models with a mitigation system—7 votes

Full size table

See Figs. 9, 10, 11 and 12. See Tables 4 and 5.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Teixeira, R., Antunes, M., Barraca, J.P. et al. Rethinking security: the resilience of shallow ML models. Int J Data Sci Anal (2024). https://doi.org/10.1007/s41060-024-00655-1

Download citation

Received: 30 April 2024
Accepted: 22 September 2024
Published: 18 October 2024
DOI: https://doi.org/10.1007/s41060-024-00655-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Rethinking security: the resilience of shallow ML models

Abstract

Similar content being viewed by others

An Approach to Measure the Effectiveness of the MITRE ATLAS Framework in Safeguarding Machine Learning Systems Against Data Poisoning Attack

Threats on Machine Learning Technique by Data Poisoning Attack: A Survey

Too Big to FAIL: What You Need to Know Before Attacking a Machine Learning System

Explore related subjects

1 Introduction

2 Background and related works

2.1 Machine learning

2.2 Data poisoning

2.3 CAPTCHA

3 Adversarial demonstration architecture

3.1 System model

3.2 Threat model

3.3 Bot implementation

3.4 Methods

3.5 Objectives

4 Evaluation

4.1 The experiment

4.2 Results without a mitigation system

4.3 Results with mitigation system

5 Discussion

6 Conclusion

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A Voting system results with a higher number of votes

Appendix A Voting system results with a higher number of votes

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation