research-article

Open access

Data Leakage Threats and Protection in Split Learning: A Survey

Authors:

Ngoc Duy Pham,

Naveen ChilamkurtiAuthors Info & Claims

ICEA '23: Proceedings of the 2023 International Conference on Intelligent Computing and Its Emerging Applications

Pages 141 - 147

https://doi.org/10.1145/3659154.3659189

Published: 26 December 2024 Publication History

PDF eReader

Abstract

Split learning (SL) is a pivotal framework in distributed learning, intended to facilitate on-device machine learning while prioritising preserving users’ private data. Nevertheless, concerns about potential data leakage arise from recent privacy attacks on SL. Unlike federated learning, another prominent distributed learning framework that has undergone numerous surveys on privacy and security aspects in recent years, SL lacks comprehensive reviews. This paper seeks to bridge that gap by analysing more than 30 recent papers that address privacy attacks and defences within the context of SL. Our analysis delves into the various attack surfaces and threat models underpinning different attacks that have the potential to lead to the leakage of users’ raw input data. Subsequently, we review the most commonly proposed defence mechanisms and discuss the open challenges and future directions our analysis has identified.

1 Introduction

Due to the success in several domains such as medicine, vision, recommendation systems, natural language processing, etc., Deep Neural Networks (DNNs) are now being deployed in numerous production systems [20]. Developing artificial intelligence applications often necessitates a substantial volume of user data to effectively train highly accurate machine learning (ML) models. Conventional centralised learning methods involve the direct aggregation of all relevant data from local sources to facilitate model training. This approach not only results in significant storage complexities and computational expenses but also raises substantial privacy concerns. To mitigate these drawbacks, distributed learning techniques are introduced, including split learning (SL) [13] and federated learning (FL) [19]. In these approaches, ML models are collaboratively trained by multiple local data parties (clients) under the coordination of a central server in the cloud.

In SL, a neural network (NN) is divided into two parts: the first half is allocated to the client(s), while the second part remains with the server. Although both the client(s) and the server jointly train the split model, they are unable to access each other’s parts. SL offers numerous advantages, including (i) collaborative NN training by multiple parties, ensuring that each party preserves the confidentiality of its model part, (ii) ML model training by users without the need to share their raw data with a server, thereby safeguard their privacy, (iii) reduced computational burden for the client, as SL operates with a smaller number of layers at the client side, and (iv) comparable model accuracy to non-split models [1]. Additionally, SL presents several notable advantages over FL, encompassing factors like (i) the absence of full disclosure of the model’s architecture and weights, (ii) achieving peak performance through minimal resource consumption (iii), reduced bandwidth requirements, and (iv) lightened computational burden on the data owners’ devices.

Both FL and SL serve as techniques for maintaining the privacy of user data. While there are existing surveys focusing on FL [25, 39], the same comprehensive attention has not been directed toward SL. SL is typically integrated into FL to alleviate computation burdens on low-end devices; therefore, an in-depth review and examination of privacy considerations in SL is needed for the development of both SL and FL. In [35], while providing a comparison between FL and SL along with key achievements, the authors review two countermeasures against information leakage in SL, namely differential privacy and distance correlation techniques. Information leakage is also emphasised in a tutorial on advanced SL in [38], where three proposed shortcomings to prevent attacks are discussed. Moreover, a comprehensive survey on the combined FL and SL in edge computing is presented in [5], where the authors review privacy attacks and defensive strategies in SL. However, it notably omits any mention of invertibility attacks and the ongoing trend of introducing homomorphic encryption into SL, as explored in this paper. In summary, the main contributions of this paper are:

•

The first comprehensive examination of attacks that can potentially lead to data leakage in SL.

•

An overview of the diverse defence mechanisms aimed at protecting against various attack vectors.

•

A discussion on a unified taxonomy for categorising attacks and defences, coupled with insights into future trends.

The rest of this paper is structured as follows: firstly, Section 2 introduces the commonly considered threat models in SL. Subsequently, Section 3 reviews the current attacks that lead to data leakage along with defensive mechanisms. Finally, Section 4 concludes the survey with a discussion.

2 Preliminaries

2.1 Split learning

In 2018, Gupta and Raskar introduce a novel collaborative learning approach named SplitNN [13, 42], also referred to as split learning (SL), to safeguard user privacy while enabling model training without the need to share users’ raw data with the server running a deep model. In general, SL divides the DNN’s layers into two parts: the head and the tail. These parts are distributed between the client and the server. The client, who possesses the raw data, is responsible for training the head part, which consists of the initial layers, using forward propagation. Importantly, the client only sends the activated outputs from the split layer (the final layer of the head part) to the server. These activated outputs are referred to as smashed data. Subsequently, upon receiving these smashed data, the server conducts forward training on the tail part, which is the most computationally intensive part. The server then initiates backward propagation on the tail part and exclusively sends the gradients of the split layer back to the client, completing the backward propagation process for the head part. This iterative process continues until the model converges.

Figure 1:

Fig. 1 demonstrates a vanilla setting with two parties including one data entry (client) and one processing entry (server), a u-shaped configuration for private label protection, and a vertical partition data setting for different data modalities. SL can be extended to a multi-client environment with multiple data entries, where a snapshot of the local model is passed through the clients sequentially during training, as described in [13]. In [43], various topologies are presented in which SL can be applied. These include:

•

An expansion of the basic SL concept involving vertically partitioned data, where the concatenated smashed data undergoes further processing at additional clients before reaching the server.

•

Multi-task SL, which leverages multi-modal data from various clients to train partial NNs up to their respective split layers. The combined smashed data are subsequently transmitted to multiple servers to train multiple models towards different learning tasks.

•

Multi-hop SL, where multiple clients consecutively train partial NNs and relay their outputs to the following client in sequence. The relay process continues until the final client sends its output to a server to complete the training.

Regardless of the configurations, local data remains local within SL, and only the smashed data is exposed for processing at the next party. Due to this characteristic of SL, we analyse the attack surface and corresponding threat models for potential data leakage in SL.

2.2 Threat models

Threat models in ML help identify and define potential security issues. They are defined in terms of the available information and the attacker’s scope of action. In the context of SL, attackers typically have one of two adversarial goals. The primary goal is to extract private information from victim clients, while the secondary goal is to intentionally manipulate the model’s behaviour. This manipulation can involve introducing backdoors, inducing misclassification, or rendering the model unusable. Since this survey focuses on privacy attacks arising from unintentional information leakage regarding the exposed smashed data or the ML model, security-based attacks are not covered here.

Throughout this paper, we consider that potential attackers are insiders, signifying that they may belong to the group of clients involved in the SL or even be associated with the centralised server itself. This assumption is established on the understanding that all communication between the server and clients can be secured using state-of-the-art cryptographic techniques. A fundamental tenet of SL is to ensure that the server remains oblivious to any private training data of the clients. Consequently, most threat models in literature adopt the semi-trust server model. The attack surface encompasses a set of vulnerabilities within the system that can be exploited. This attack surface can be further subdivided into scenarios involving either a malicious client or a malicious server. The susceptible points of attack in SL are illustrated in the bottom part of Fig. 2.

Figure 2:

In SL, raw data remains localised, which means that indirect (inferred) information exposure can occur. Considering the architecture and procedure of SL, there are attack surfaces to be considered for data privacy, ranging from the exposed smashed data to the sharing of snapshots among local clients. In the common semi-trusted server model, certain assumptions increase the server’s ability, such as the server having information about the local model or the client being queryable [14]. A recent work, as described in [31], considers the strongest threat model, where SL is conducted in a total semi-trusted environment where any participant (client or server) can potentially act as an attacker. The feature-space hijacking attack [27] is one exceptional work where the authors assume that the server is non-trustworthy, meaning the server does not strictly follow the learning procedure. However, the objective is to steal input data without causing harm to the model’s utility, making it a privacy attack in our study.

The following section reviews the current attacks on this exposed information in SL, followed by defence strategies.

3 Data Leakage in Split Learning

3.1 Attacks strategies

Data leakage in SL can occur when the raw input data is inferred directly from the exposed smashed data through straightforward invertibility or more sophisticated reconstruction techniques.

Visual invertibility attack. Abuadbba et al. [1] firstly introduce SL to a one-dimensional Convolutional Neural Network (1D-CNN) and demonstrate that the smashed data sent from the client to the server still retains a significant amount of information about the original data. The smashed data exhibits a high degree of similarity to the original input, suggesting the potential for substantial leakage. This potential leakage is quantified using distance correlation and dynamic time-warping metrics, affirming the high risk of its exploitation for reconstructing the raw input data. A similar observation can be found in the research conducted by Pham et al. [29], where the authors identify potential data leakage in CNN-based SL applied to 2D-image data. Fig. 3 illustrates the potential leakage of raw data at the split layer in a typical 2D-CNN. The visually plotted feature maps from a channel in the first convolution and sub-sampling layers closely resemble the raw input image.

Figure 3:

Feature-space hijacking attack. In [27], Pasquini et al. introduce a novel attack strategy known as the Feature-Space Hijack Attack (FSHA) that empowers a malicious server to regain access to private input data during the training process of SL. FSHA involves the server assuming control over the learning process of client models, steering them into a vulnerable state that can be exploited for the inference of input data. The authors recognise that a server-based attacker holds the potential to manipulate the direction of optimisation for the client model by influencing the training process. Consequently, they develop FSHA with the aim of achieving high-quality reconstruction of the client’s private data. However, the effectiveness of this attack is contingent on having access to a substantial volume of training data that aligns with the distribution of the client’s data.

Model inversion attacks. Diverging from FSHA, the model inversion attack, as outlined by Erdogan et al. in [8], pursues a distinct objective. This attack strategy aims to obtain a functionally equivalent model to the client’s model and access the raw training data without relying on any prior knowledge of the client’s dataset. The sole assumption underlying this attack is that the attacker possesses knowledge of the client-side model’s architecture. Deprived of data that resembles the training data and incapable of querying the client model, the attacker’s task transforms into a quest spanning the entire space of potential input values and client model parameters. The attacker rigorously adheres to the SL protocol and exclusively necessitates access to the smashed data to execute the model inversion and data inference. This aspect makes it challenging for clients to detect such an attack. In [14], He et al. explore various attack scenarios: (i) the white-box scenario, where the attacker has access to the local model and uses it to reconstruct the images; (ii) the black-box scenario, where the attacker lacks knowledge of the local model but can query it to recreate a similar one; and (iii) the query-free scenario, where the attacker cannot query the client but aims to construct a substitute model for data reconstruction. The last scenario yields the least favourable results, as expected, given the limited capabilities of the attacker. Additionally, the architecture of the model and the division of layers between the client and the server influence the quality of reconstruction. Having fewer layers in the client generally leads to better reconstruction by the centralised server.

In another approach, Gao and Zhang [11] propose a passive inference attack named Pseudo-Client ATtack (PCAT), in which the server adheres to the SL training protocol but attempts to infer the private data of the clients by analysing the exposed smashed data. The attacker only needs access to a small amount of training data to develop a mechanism for data reconstruction comparable to FSHA. Notably, PCAT does not disrupt the primary training process, making it challenging to detect. While previous attempts at attacks often rely on strong assumptions or targeted easily exploitable models, Zhu et al. introduce a more practical approach in [48]. They present Simulator Decoding with Adversarial Regularisation (SDAR), which leverages auxiliary data and adversarial regularisation to learn a decodable simulator of the client’s private model. SDAR, when applied against SL with a semi-trusted server, can effectively infer the client’s private features in vanilla SL, and both features and labels in u-shaped SL.

3.2 Defence approaches

Data encryption. One potential solution is to utilise privacy-preserving techniques - to encrypt the model and data, allowing different organisations to use a model held by another organisation without revealing their proprietary information. Two well-known techniques that enable computations over encrypted data, while preserving privacy are Homomorphic Encryption (HE) [2] and Secure Multi-Party Computation (SMPC) [22]. Both of these techniques appear to be promising solutions, as they enable computations on encrypted data without disclosing the underlying information. However, there are underlying challenges that can complicate their implementation, such as computational complexity in HE and communication costs in SMPC [3].

While numerous Privacy-preserving ML works employ HE to protect users’ inputs, there are relatively few works that combine HE with SL. Pereteanu et al. [28] propose a method in which the server model part is divided into private sections separated by a public section accessible in plain text by the client to expedite classification while utilising HE. This approach is limited to client input classification and does not allow a client to customise a model part for their private dataset. Recently, Khan et al. [15] introduce an approach that combines SL and HE, where the client encrypts the smashed data before sending it to the server. However, one limitation of this hybrid approach is that during backward propagation, a server can extract valuable information about the client’s input data, by exploiting the gradients sent from the client, potentially leading to privacy breaches. In response to this concern, Nguyen et al. [26] propose an enhanced protocol to mitigate this data leakage issue in [15], offering improved speed and reducing communication overhead.

More recently, Khan et al. [16, 17] devise a protocol that allows u-shaped SL to operate on homomorphically encrypted data. In their approach, the client applies HE to the smashed data before sending it to the server, effectively protecting user data privacy. However, it’s noted that their work is primarily focused on 1D time-series data, such as ECG signals, and is limited to a single client. Extending this approach to multiple clients, which would necessitate the use of a multi-key HE scheme, is left for future work by the authors.

Data decorrelation. In response to the potential data leakage from SL smashed data, Abuadbba et al. [1] explore a strategy to mitigate privacy risks by introducing additional hidden layers to the local model, specifically, by adding more convolutional layers to the client before the split layer. This approach results in a more complex model architecture while maintaining a constant number of layers held by the server. Their evaluation reveals a slight reduction in the distance correlation between the input and smashed data as the number of hidden convolution layers increases. However, some highly correlated channels still remain, indicating the possibility of significant leakage and the potential for raw data reconstruction.

Another approach, as proposed in [44, 45] by Vepakomma et al., aims to bolster privacy safeguards within SL by introducing a loss term based on distance correlation into the overall loss function. Distance correlation (DC), serving as a metric for assessing the statistical interdependence among random variables, is utilised to minimise the correlation between the original input and the smashed data. The optimisation of both these loss terms, including the standard loss function and the DC loss term, is designed to reduce the amount of information present in the smashed data for potential raw data reconstruction, while still preserving the model’s accuracy. It’s important to highlight that introducing the additional DC to the server’s loss function could pose privacy risks, potentially enabling attackers to reconstruct the original input data if they possess both the DC value and access to the smashed data transmitted over the network. To address this concern, Turina et al. [40] introduce a client-based privacy protection method integrated into a hybrid FL-SL framework. This innovative approach employs two distinct loss functions, one dedicated to clients and the other to the server. The first loss function prioritises privacy aspects and includes elements like DC or differential privacy, functioning exclusively on the client side. Conversely, the second (global) loss function is calculated on the server and extends its influence over both clients and the server throughout the training process. Empirical evidence underscores the effectiveness of this approach in maintaining data privacy in both hybrid FL-SL and parallel SL setups. Moreover, it becomes evident that the client-based privacy approach, employing DC, outperforms the noise-based approach in achieving a balance between privacy and model accuracy.

Another engineering-driven approach to minimise the information transmitted in SL is through the selective pruning of channels in the client-side smashed data, as demonstrated by Singh et al. in [34]. Learning a pruning filter to selectively remove channels in the latent representation space at the split layer is empirically shown to prevent various state-of-the-art reconstruction attacks during the prediction step in private collaborative inference scenarios.

Quantisation provides another avenue for decorrelating input and smashed data. Yu et al. [46] introduce the Stepwise activation function to render activation outputs irreversible, with the effectiveness of this approach dependent on the Stepwise parameters, exhibiting trade-offs between accuracy and privacy preservation. An extreme quantised approach, known as Binarised SL (B-SL), proposed by Pham et al. in [29], binarising the local SL model including the smashed data that exposes to the server. The process of binarisation introduces latent noise into the smashed data, effectively diminishing the server’s capacity to reconstruct the original training data. Furthermore, the authors incorporate an extra loss term alongside the standard model accuracy loss, aiming to minimise the leakage of locally sensitive data. Note that, the loss term in the B-SL framework is versatile and not restricted to the DC term used in [44]. Additionally, the authors provide three methods for the implementation of differential privacy within the B-SL framework to ensure privacy guarantees. Experimental results reported in [29] demonstrate the effectiveness of B-SL in mitigating privacy vulnerabilities under FSHA attacks.

In a different approach, Qiu et al. [32] recommend the adoption of hashing as a protective measure against reconstruction attacks. Their approach entails implementing the Sign function on the smashed data before sending the outcomes to the server, making data reconstruction exceedingly challenging. To preserve the model’s trainability with the Sign function, the authors leverage techniques like batch normalisation and the straight-through estimator. Both of these methods contribute to reinforcing the defence against reconstruction attacks while upholding high accuracy.

Noise-based mechanisms. Noise-based mechanisms offer a defence strategy that avoids the heavy computational burden of cryptographic primitives. The mechanisms involve adaptively injecting noise into smashed data while retaining the server’s ability to perform tasks. Noise is treated as an additional set of trainable parameter probabilities, which can be gradually eliminated through end-to-end self-supervised training [4]. For instance, Shredder [24] proposed by Mireshghallah et al. achieves an asymmetric balance between accuracy and privacy by adding noise as part of the gradient-based learning process, effectively reducing the information content of smashed data sent by clients to servers for inference. Similarly, Abuadbba et al. [1] and Titcombe et al. [37] apply noise to the smashed data before transmitting it to the server, framing this defence as a differential privacy (DP) mechanism [6].

In another approach, Mahawaga Arachchige et al. [23] provide a differentially private mechanism for sharing activations following a flattening layer that comes after convolutional and pooling layers. These flattened outputs are binarised, and a utility-enhanced randomisation mechanism, inspired by RAPPOR [9], is applied to create a differentially private binary representation. These binary representations are then sent to the server, where fully connected layers perform final predictions. Vepakomma et al. [41], by integrating DP, propose PrivateMail, a differentially private mechanism for supervised manifold embeddings of features extracted from deep networks for image retrieval tasks. PrivateMail is claimed to achieve a substantially improved balance between privacy and utility compared to several baselines. More recently, Ryu et al. [33] conduct a systematic study to assess the effectiveness of DP in collaborative inference against reconstruction attacks. In summary, noise-based mechanisms can defend against data leakage, either directly from smashed data or from reconstruction attacks. However, it’s important to note that the addition of noise can significantly impact the model’s accuracy, even with modest levels of noise [37].

Protecting from model inversion attacks. DP, while effective in preventing data leakage, often comes at the cost of model accuracy. Recently, Pham et al. in [31] develop a new SL framework in which client-side data privacy is enhanced without the need for sequential data sharing between clients. By disallowing the sharing of local models among clients, the risk of local models being inverted for data reconstruction is reduced. The authors demonstrate that this non-local-sharing SL can reduce leakage due to model inversion attacks by half, though attackers can still attempt to reconstruct private data by modifying the Deep Leakage attack [47].

To safeguard hybrid SL-FL from model inversion threats, Li et al. [21] propose a model inversion-resistant framework called ResSFL. ResSFL involves two key steps: an initial pre-training phase that constructs a feature extractor designed to withstand model inversion, followed by a subsequent resistance transfer phase that employs this feature extractor to initialise client-side models. During the pre-training phase, an attacker-aware training technique is employed, mimicking an attacker with a robust inversion model and introducing bottleneck layers to limit the feature space. Typically, this pre-training is conducted by an expert, often a powerful client or a third party with sufficient computational resources. In the second phase, the robust feature extractor is utilised to initialise the SL-FL training scheme for a new task. In another study, Khowaja et al. [18] propose a method that segments raw data into patches, rendering the recovery of the original data more challenging. The authors emphasise the growing concern regarding model security and suggest that their proximal gradient-based learning networks could effectively thwart model inversion attacks. Results from these studies indicate that the reconstructed version often fails to yield meaningful information.

Protecting from feature-space hijacking attacks. The division of the model into client and server parts in SL introduces a unique type of inference attack, enabling a malicious server to influence the client model’s training process and infer training data, as exemplified by the FSHA. Regarding defence mechanisms, Erdogan et al. [7] argue that the direction of the client’s parameter in FSHA is unrelated to the primary task. Consequently, introducing a small amount of erroneous data during the training process enables clients to monitor changes in the gradient information provided by the server, assisting in the detection of any malicious behaviour. However, Fu et al. [10] put forward an attack strategy aimed at circumventing this detection mechanism. They clarify that the malicious server in FSHA is fundamentally constructing an auto-encoder, a behaviour that clients can identify by comparing the expected model gradients with those of the auto-encoder.

Research conducted by Gawron and Stubbings [12] highlights that DP might not provide adequate protection against FSHA. In their investigation, they apply FSHA to SL protected by DP using a client-side DP optimiser. The empirical findings suggest that while DP can delay the convergence of FSHA, this attack method still successfully reconstructs the client’s private data with a low margin of error at various DP settings. Furthermore, the authors explore the utilisation of dimensionality reduction techniques applied directly to the raw data before training as a privacy protection measure. This approach is found to partially mitigate FSHA but could impact model accuracy, especially when dealing with large datasets.

4 Discussion and Conclusion

Protecting privacy in SL can be achieved through the use of cryptographic privacy-preserving machine learning systems. However, these cryptographic methods come with substantial computational and communication costs and may not be practical in many scenarios. Even with computational enhancements, they may still be impractical for SL with encrypted data [15, 16, 17, 26].

While the potential for data leakage through smashed data visualisation is relatively minor when compared to inversion attacks, it can be efficiently addressed using techniques such as DP, decorrelation, or quantisation, albeit with some degree of accuracy loss [29]. DP is also a key mechanism for defending against model inversion attacks, introducing a trade-off between model accuracy and privacy preservation. Analysing these trade-offs is crucial to guide practitioners in selecting the appropriate noise level for their desired utility [30].

One inherent vulnerability in SL is the exposure of smashed data, where the client must accept the computed gradients from the server, making it a potential target for hijacking attacks. It’s important to note that, for security, we normally assume the presence of non-trustworthy participants, while for privacy, we aim to ensure that all participants adhere to the procedure (trustworthy). However, specific attacks, like FSHA [27], fall out of the category of semi-honest attacks as they involve modifications to the procedure that can affect the learning objectives. These attacks pose significant challenges to privacy preservation in SL, and they may necessitate a reevaluation of the SL design from scratch, as suggested by some authors or researchers.

In conclusion, this paper presents a survey of potential data leakage in the recently emerged field of SL. We review various types of attacks on smashed data that can reveal raw input data, including visualisation, model inversion, and feature hijacking. Additionally, we summarise a range of defence mechanisms and analyse their efficacy. However, it’s important to recognise that there are trade-offs between accuracy and privacy, which require careful consideration by practitioners. This survey focuses specifically on data leakage concerns and serves as a foundational study for a deeper exploration of various privacy attacks in SL.

References

[1]

Sharif Abuadbba, Kyuyeon Kim, Minki Kim, Chandra Thapa, Seyit A. Camtepe, Yansong Gao, Hyoungshick Kim, and Surya Nepal. 2020. Can We Use Split Learning on 1D CNN Models for Privacy Preserving Training?. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security. 305–318.

Abstract

1 Introduction

2 Preliminaries

2.1 Split learning

2.2 Threat models

3 Data Leakage in Split Learning

3.1 Attacks strategies

3.2 Defence approaches

4 Discussion and Conclusion

References

Index Terms

Recommendations

Make Split, not Hijack: Preventing Feature-Space Hijacking Attacks in Split Learning

PSLF: Defending Against Label Leakage in Split Learning

PPTD

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations