Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3659154.3659189acmotherconferencesArticle/Chapter ViewFull TextPublication PagesiceaConference Proceedingsconference-collections
research-article
Open access

Data Leakage Threats and Protection in Split Learning: A Survey

Published: 26 December 2024 Publication History

Abstract

Split learning (SL) is a pivotal framework in distributed learning, intended to facilitate on-device machine learning while prioritising preserving users’ private data. Nevertheless, concerns about potential data leakage arise from recent privacy attacks on SL. Unlike federated learning, another prominent distributed learning framework that has undergone numerous surveys on privacy and security aspects in recent years, SL lacks comprehensive reviews. This paper seeks to bridge that gap by analysing more than 30 recent papers that address privacy attacks and defences within the context of SL. Our analysis delves into the various attack surfaces and threat models underpinning different attacks that have the potential to lead to the leakage of users’ raw input data. Subsequently, we review the most commonly proposed defence mechanisms and discuss the open challenges and future directions our analysis has identified.

1 Introduction

Due to the success in several domains such as medicine, vision, recommendation systems, natural language processing, etc., Deep Neural Networks (DNNs) are now being deployed in numerous production systems [20]. Developing artificial intelligence applications often necessitates a substantial volume of user data to effectively train highly accurate machine learning (ML) models. Conventional centralised learning methods involve the direct aggregation of all relevant data from local sources to facilitate model training. This approach not only results in significant storage complexities and computational expenses but also raises substantial privacy concerns. To mitigate these drawbacks, distributed learning techniques are introduced, including split learning (SL) [13] and federated learning (FL) [19]. In these approaches, ML models are collaboratively trained by multiple local data parties (clients) under the coordination of a central server in the cloud.
In SL, a neural network (NN) is divided into two parts: the first half is allocated to the client(s), while the second part remains with the server. Although both the client(s) and the server jointly train the split model, they are unable to access each other’s parts. SL offers numerous advantages, including (i) collaborative NN training by multiple parties, ensuring that each party preserves the confidentiality of its model part, (ii) ML model training by users without the need to share their raw data with a server, thereby safeguard their privacy, (iii) reduced computational burden for the client, as SL operates with a smaller number of layers at the client side, and (iv) comparable model accuracy to non-split models [1]. Additionally, SL presents several notable advantages over FL, encompassing factors like (i) the absence of full disclosure of the model’s architecture and weights, (ii) achieving peak performance through minimal resource consumption (iii), reduced bandwidth requirements, and (iv) lightened computational burden on the data owners’ devices.
Both FL and SL serve as techniques for maintaining the privacy of user data. While there are existing surveys focusing on FL [25, 39], the same comprehensive attention has not been directed toward SL. SL is typically integrated into FL to alleviate computation burdens on low-end devices; therefore, an in-depth review and examination of privacy considerations in SL is needed for the development of both SL and FL. In [35], while providing a comparison between FL and SL along with key achievements, the authors review two countermeasures against information leakage in SL, namely differential privacy and distance correlation techniques. Information leakage is also emphasised in a tutorial on advanced SL in [38], where three proposed shortcomings to prevent attacks are discussed. Moreover, a comprehensive survey on the combined FL and SL in edge computing is presented in [5], where the authors review privacy attacks and defensive strategies in SL. However, it notably omits any mention of invertibility attacks and the ongoing trend of introducing homomorphic encryption into SL, as explored in this paper. In summary, the main contributions of this paper are:
The first comprehensive examination of attacks that can potentially lead to data leakage in SL.
An overview of the diverse defence mechanisms aimed at protecting against various attack vectors.
A discussion on a unified taxonomy for categorising attacks and defences, coupled with insights into future trends.
The rest of this paper is structured as follows: firstly, Section 2 introduces the commonly considered threat models in SL. Subsequently, Section 3 reviews the current attacks that lead to data leakage along with defensive mechanisms. Finally, Section 4 concludes the survey with a discussion.

2 Preliminaries

2.1 Split learning

In 2018, Gupta and Raskar introduce a novel collaborative learning approach named SplitNN [13, 42], also referred to as split learning (SL), to safeguard user privacy while enabling model training without the need to share users’ raw data with the server running a deep model. In general, SL divides the DNN’s layers into two parts: the head and the tail. These parts are distributed between the client and the server. The client, who possesses the raw data, is responsible for training the head part, which consists of the initial layers, using forward propagation. Importantly, the client only sends the activated outputs from the split layer (the final layer of the head part) to the server. These activated outputs are referred to as smashed data. Subsequently, upon receiving these smashed data, the server conducts forward training on the tail part, which is the most computationally intensive part. The server then initiates backward propagation on the tail part and exclusively sends the gradients of the split layer back to the client, completing the backward propagation process for the head part. This iterative process continues until the model converges.
Figure 1:
Figure 1: Different configurations of SL [42].
Fig. 1 demonstrates a vanilla setting with two parties including one data entry (client) and one processing entry (server), a u-shaped configuration for private label protection, and a vertical partition data setting for different data modalities. SL can be extended to a multi-client environment with multiple data entries, where a snapshot of the local model is passed through the clients sequentially during training, as described in [13]. In [43], various topologies are presented in which SL can be applied. These include:
An expansion of the basic SL concept involving vertically partitioned data, where the concatenated smashed data undergoes further processing at additional clients before reaching the server.
Multi-task SL, which leverages multi-modal data from various clients to train partial NNs up to their respective split layers. The combined smashed data are subsequently transmitted to multiple servers to train multiple models towards different learning tasks.
Multi-hop SL, where multiple clients consecutively train partial NNs and relay their outputs to the following client in sequence. The relay process continues until the final client sends its output to a server to complete the training.
Regardless of the configurations, local data remains local within SL, and only the smashed data is exposed for processing at the next party. Due to this characteristic of SL, we analyse the attack surface and corresponding threat models for potential data leakage in SL.

2.2 Threat models

Threat models in ML help identify and define potential security issues. They are defined in terms of the available information and the attacker’s scope of action. In the context of SL, attackers typically have one of two adversarial goals. The primary goal is to extract private information from victim clients, while the secondary goal is to intentionally manipulate the model’s behaviour. This manipulation can involve introducing backdoors, inducing misclassification, or rendering the model unusable. Since this survey focuses on privacy attacks arising from unintentional information leakage regarding the exposed smashed data or the ML model, security-based attacks are not covered here.
Throughout this paper, we consider that potential attackers are insiders, signifying that they may belong to the group of clients involved in the SL or even be associated with the centralised server itself. This assumption is established on the understanding that all communication between the server and clients can be secured using state-of-the-art cryptographic techniques. A fundamental tenet of SL is to ensure that the server remains oblivious to any private training data of the clients. Consequently, most threat models in literature adopt the semi-trust server model. The attack surface encompasses a set of vulnerabilities within the system that can be exploited. This attack surface can be further subdivided into scenarios involving either a malicious client or a malicious server. The susceptible points of attack in SL are illustrated in the bottom part of Fig. 2.
Figure 2:
Figure 2: Top: SL process: Client sends smashed data to the server and receives corresponding gradients. a) In sequential training, a snapshot of the local model is shared with the next client. b) In SFL [36] - a variant that combines SL and FL, a local Fed server is used to aggregate all local models. Bottom: Vulnerable points defining the attack surface for all participants in SL.
In SL, raw data remains localised, which means that indirect (inferred) information exposure can occur. Considering the architecture and procedure of SL, there are attack surfaces to be considered for data privacy, ranging from the exposed smashed data to the sharing of snapshots among local clients. In the common semi-trusted server model, certain assumptions increase the server’s ability, such as the server having information about the local model or the client being queryable [14]. A recent work, as described in [31], considers the strongest threat model, where SL is conducted in a total semi-trusted environment where any participant (client or server) can potentially act as an attacker. The feature-space hijacking attack [27] is one exceptional work where the authors assume that the server is non-trustworthy, meaning the server does not strictly follow the learning procedure. However, the objective is to steal input data without causing harm to the model’s utility, making it a privacy attack in our study.
The following section reviews the current attacks on this exposed information in SL, followed by defence strategies.

3 Data Leakage in Split Learning

3.1 Attacks strategies

Data leakage in SL can occur when the raw input data is inferred directly from the exposed smashed data through straightforward invertibility or more sophisticated reconstruction techniques.
Visual invertibility attack. Abuadbba et al. [1] firstly introduce SL to a one-dimensional Convolutional Neural Network (1D-CNN) and demonstrate that the smashed data sent from the client to the server still retains a significant amount of information about the original data. The smashed data exhibits a high degree of similarity to the original input, suggesting the potential for substantial leakage. This potential leakage is quantified using distance correlation and dynamic time-warping metrics, affirming the high risk of its exploitation for reconstructing the raw input data. A similar observation can be found in the research conducted by Pham et al. [29], where the authors identify potential data leakage in CNN-based SL applied to 2D-image data. Fig. 3 illustrates the potential leakage of raw data at the split layer in a typical 2D-CNN. The visually plotted feature maps from a channel in the first convolution and sub-sampling layers closely resemble the raw input image.
Figure 3:
Figure 3: SL data leakage through visual invertibility [29].
Feature-space hijacking attack. In [27], Pasquini et al. introduce a novel attack strategy known as the Feature-Space Hijack Attack (FSHA) that empowers a malicious server to regain access to private input data during the training process of SL. FSHA involves the server assuming control over the learning process of client models, steering them into a vulnerable state that can be exploited for the inference of input data. The authors recognise that a server-based attacker holds the potential to manipulate the direction of optimisation for the client model by influencing the training process. Consequently, they develop FSHA with the aim of achieving high-quality reconstruction of the client’s private data. However, the effectiveness of this attack is contingent on having access to a substantial volume of training data that aligns with the distribution of the client’s data.
Model inversion attacks. Diverging from FSHA, the model inversion attack, as outlined by Erdogan et al. in [8], pursues a distinct objective. This attack strategy aims to obtain a functionally equivalent model to the client’s model and access the raw training data without relying on any prior knowledge of the client’s dataset. The sole assumption underlying this attack is that the attacker possesses knowledge of the client-side model’s architecture. Deprived of data that resembles the training data and incapable of querying the client model, the attacker’s task transforms into a quest spanning the entire space of potential input values and client model parameters. The attacker rigorously adheres to the SL protocol and exclusively necessitates access to the smashed data to execute the model inversion and data inference. This aspect makes it challenging for clients to detect such an attack. In [14], He et al. explore various attack scenarios: (i) the white-box scenario, where the attacker has access to the local model and uses it to reconstruct the images; (ii) the black-box scenario, where the attacker lacks knowledge of the local model but can query it to recreate a similar one; and (iii) the query-free scenario, where the attacker cannot query the client but aims to construct a substitute model for data reconstruction. The last scenario yields the least favourable results, as expected, given the limited capabilities of the attacker. Additionally, the architecture of the model and the division of layers between the client and the server influence the quality of reconstruction. Having fewer layers in the client generally leads to better reconstruction by the centralised server.
In another approach, Gao and Zhang [11] propose a passive inference attack named Pseudo-Client ATtack (PCAT), in which the server adheres to the SL training protocol but attempts to infer the private data of the clients by analysing the exposed smashed data. The attacker only needs access to a small amount of training data to develop a mechanism for data reconstruction comparable to FSHA. Notably, PCAT does not disrupt the primary training process, making it challenging to detect. While previous attempts at attacks often rely on strong assumptions or targeted easily exploitable models, Zhu et al. introduce a more practical approach in [48]. They present Simulator Decoding with Adversarial Regularisation (SDAR), which leverages auxiliary data and adversarial regularisation to learn a decodable simulator of the client’s private model. SDAR, when applied against SL with a semi-trusted server, can effectively infer the client’s private features in vanilla SL, and both features and labels in u-shaped SL.

3.2 Defence approaches

Data encryption. One potential solution is to utilise privacy-preserving techniques - to encrypt the model and data, allowing different organisations to use a model held by another organisation without revealing their proprietary information. Two well-known techniques that enable computations over encrypted data, while preserving privacy are Homomorphic Encryption (HE) [2] and Secure Multi-Party Computation (SMPC) [22]. Both of these techniques appear to be promising solutions, as they enable computations on encrypted data without disclosing the underlying information. However, there are underlying challenges that can complicate their implementation, such as computational complexity in HE and communication costs in SMPC [3].
While numerous Privacy-preserving ML works employ HE to protect users’ inputs, there are relatively few works that combine HE with SL. Pereteanu et al. [28] propose a method in which the server model part is divided into private sections separated by a public section accessible in plain text by the client to expedite classification while utilising HE. This approach is limited to client input classification and does not allow a client to customise a model part for their private dataset. Recently, Khan et al. [15] introduce an approach that combines SL and HE, where the client encrypts the smashed data before sending it to the server. However, one limitation of this hybrid approach is that during backward propagation, a server can extract valuable information about the client’s input data, by exploiting the gradients sent from the client, potentially leading to privacy breaches. In response to this concern, Nguyen et al. [26] propose an enhanced protocol to mitigate this data leakage issue in [15], offering improved speed and reducing communication overhead.
More recently, Khan et al. [16, 17] devise a protocol that allows u-shaped SL to operate on homomorphically encrypted data. In their approach, the client applies HE to the smashed data before sending it to the server, effectively protecting user data privacy. However, it’s noted that their work is primarily focused on 1D time-series data, such as ECG signals, and is limited to a single client. Extending this approach to multiple clients, which would necessitate the use of a multi-key HE scheme, is left for future work by the authors.
Data decorrelation. In response to the potential data leakage from SL smashed data, Abuadbba et al. [1] explore a strategy to mitigate privacy risks by introducing additional hidden layers to the local model, specifically, by adding more convolutional layers to the client before the split layer. This approach results in a more complex model architecture while maintaining a constant number of layers held by the server. Their evaluation reveals a slight reduction in the distance correlation between the input and smashed data as the number of hidden convolution layers increases. However, some highly correlated channels still remain, indicating the possibility of significant leakage and the potential for raw data reconstruction.
Another approach, as proposed in [44, 45] by Vepakomma et al., aims to bolster privacy safeguards within SL by introducing a loss term based on distance correlation into the overall loss function. Distance correlation (DC), serving as a metric for assessing the statistical interdependence among random variables, is utilised to minimise the correlation between the original input and the smashed data. The optimisation of both these loss terms, including the standard loss function and the DC loss term, is designed to reduce the amount of information present in the smashed data for potential raw data reconstruction, while still preserving the model’s accuracy. It’s important to highlight that introducing the additional DC to the server’s loss function could pose privacy risks, potentially enabling attackers to reconstruct the original input data if they possess both the DC value and access to the smashed data transmitted over the network. To address this concern, Turina et al. [40] introduce a client-based privacy protection method integrated into a hybrid FL-SL framework. This innovative approach employs two distinct loss functions, one dedicated to clients and the other to the server. The first loss function prioritises privacy aspects and includes elements like DC or differential privacy, functioning exclusively on the client side. Conversely, the second (global) loss function is calculated on the server and extends its influence over both clients and the server throughout the training process. Empirical evidence underscores the effectiveness of this approach in maintaining data privacy in both hybrid FL-SL and parallel SL setups. Moreover, it becomes evident that the client-based privacy approach, employing DC, outperforms the noise-based approach in achieving a balance between privacy and model accuracy.
Another engineering-driven approach to minimise the information transmitted in SL is through the selective pruning of channels in the client-side smashed data, as demonstrated by Singh et al. in [34]. Learning a pruning filter to selectively remove channels in the latent representation space at the split layer is empirically shown to prevent various state-of-the-art reconstruction attacks during the prediction step in private collaborative inference scenarios.
Quantisation provides another avenue for decorrelating input and smashed data. Yu et al. [46] introduce the Stepwise activation function to render activation outputs irreversible, with the effectiveness of this approach dependent on the Stepwise parameters, exhibiting trade-offs between accuracy and privacy preservation. An extreme quantised approach, known as Binarised SL (B-SL), proposed by Pham et al. in [29], binarising the local SL model including the smashed data that exposes to the server. The process of binarisation introduces latent noise into the smashed data, effectively diminishing the server’s capacity to reconstruct the original training data. Furthermore, the authors incorporate an extra loss term alongside the standard model accuracy loss, aiming to minimise the leakage of locally sensitive data. Note that, the loss term in the B-SL framework is versatile and not restricted to the DC term used in [44]. Additionally, the authors provide three methods for the implementation of differential privacy within the B-SL framework to ensure privacy guarantees. Experimental results reported in [29] demonstrate the effectiveness of B-SL in mitigating privacy vulnerabilities under FSHA attacks.
In a different approach, Qiu et al. [32] recommend the adoption of hashing as a protective measure against reconstruction attacks. Their approach entails implementing the Sign function on the smashed data before sending the outcomes to the server, making data reconstruction exceedingly challenging. To preserve the model’s trainability with the Sign function, the authors leverage techniques like batch normalisation and the straight-through estimator. Both of these methods contribute to reinforcing the defence against reconstruction attacks while upholding high accuracy.
Noise-based mechanisms. Noise-based mechanisms offer a defence strategy that avoids the heavy computational burden of cryptographic primitives. The mechanisms involve adaptively injecting noise into smashed data while retaining the server’s ability to perform tasks. Noise is treated as an additional set of trainable parameter probabilities, which can be gradually eliminated through end-to-end self-supervised training [4]. For instance, Shredder [24] proposed by Mireshghallah et al. achieves an asymmetric balance between accuracy and privacy by adding noise as part of the gradient-based learning process, effectively reducing the information content of smashed data sent by clients to servers for inference. Similarly, Abuadbba et al. [1] and Titcombe et al. [37] apply noise to the smashed data before transmitting it to the server, framing this defence as a differential privacy (DP) mechanism [6].
In another approach, Mahawaga Arachchige et al. [23] provide a differentially private mechanism for sharing activations following a flattening layer that comes after convolutional and pooling layers. These flattened outputs are binarised, and a utility-enhanced randomisation mechanism, inspired by RAPPOR [9], is applied to create a differentially private binary representation. These binary representations are then sent to the server, where fully connected layers perform final predictions. Vepakomma et al. [41], by integrating DP, propose PrivateMail, a differentially private mechanism for supervised manifold embeddings of features extracted from deep networks for image retrieval tasks. PrivateMail is claimed to achieve a substantially improved balance between privacy and utility compared to several baselines. More recently, Ryu et al. [33] conduct a systematic study to assess the effectiveness of DP in collaborative inference against reconstruction attacks. In summary, noise-based mechanisms can defend against data leakage, either directly from smashed data or from reconstruction attacks. However, it’s important to note that the addition of noise can significantly impact the model’s accuracy, even with modest levels of noise [37].
Protecting from model inversion attacks. DP, while effective in preventing data leakage, often comes at the cost of model accuracy. Recently, Pham et al. in [31] develop a new SL framework in which client-side data privacy is enhanced without the need for sequential data sharing between clients. By disallowing the sharing of local models among clients, the risk of local models being inverted for data reconstruction is reduced. The authors demonstrate that this non-local-sharing SL can reduce leakage due to model inversion attacks by half, though attackers can still attempt to reconstruct private data by modifying the Deep Leakage attack [47].
To safeguard hybrid SL-FL from model inversion threats, Li et al. [21] propose a model inversion-resistant framework called ResSFL. ResSFL involves two key steps: an initial pre-training phase that constructs a feature extractor designed to withstand model inversion, followed by a subsequent resistance transfer phase that employs this feature extractor to initialise client-side models. During the pre-training phase, an attacker-aware training technique is employed, mimicking an attacker with a robust inversion model and introducing bottleneck layers to limit the feature space. Typically, this pre-training is conducted by an expert, often a powerful client or a third party with sufficient computational resources. In the second phase, the robust feature extractor is utilised to initialise the SL-FL training scheme for a new task. In another study, Khowaja et al. [18] propose a method that segments raw data into patches, rendering the recovery of the original data more challenging. The authors emphasise the growing concern regarding model security and suggest that their proximal gradient-based learning networks could effectively thwart model inversion attacks. Results from these studies indicate that the reconstructed version often fails to yield meaningful information.
Protecting from feature-space hijacking attacks. The division of the model into client and server parts in SL introduces a unique type of inference attack, enabling a malicious server to influence the client model’s training process and infer training data, as exemplified by the FSHA. Regarding defence mechanisms, Erdogan et al. [7] argue that the direction of the client’s parameter in FSHA is unrelated to the primary task. Consequently, introducing a small amount of erroneous data during the training process enables clients to monitor changes in the gradient information provided by the server, assisting in the detection of any malicious behaviour. However, Fu et al. [10] put forward an attack strategy aimed at circumventing this detection mechanism. They clarify that the malicious server in FSHA is fundamentally constructing an auto-encoder, a behaviour that clients can identify by comparing the expected model gradients with those of the auto-encoder.
Research conducted by Gawron and Stubbings [12] highlights that DP might not provide adequate protection against FSHA. In their investigation, they apply FSHA to SL protected by DP using a client-side DP optimiser. The empirical findings suggest that while DP can delay the convergence of FSHA, this attack method still successfully reconstructs the client’s private data with a low margin of error at various DP settings. Furthermore, the authors explore the utilisation of dimensionality reduction techniques applied directly to the raw data before training as a privacy protection measure. This approach is found to partially mitigate FSHA but could impact model accuracy, especially when dealing with large datasets.

4 Discussion and Conclusion

Protecting privacy in SL can be achieved through the use of cryptographic privacy-preserving machine learning systems. However, these cryptographic methods come with substantial computational and communication costs and may not be practical in many scenarios. Even with computational enhancements, they may still be impractical for SL with encrypted data [15, 16, 17, 26].
While the potential for data leakage through smashed data visualisation is relatively minor when compared to inversion attacks, it can be efficiently addressed using techniques such as DP, decorrelation, or quantisation, albeit with some degree of accuracy loss [29]. DP is also a key mechanism for defending against model inversion attacks, introducing a trade-off between model accuracy and privacy preservation. Analysing these trade-offs is crucial to guide practitioners in selecting the appropriate noise level for their desired utility [30].
One inherent vulnerability in SL is the exposure of smashed data, where the client must accept the computed gradients from the server, making it a potential target for hijacking attacks. It’s important to note that, for security, we normally assume the presence of non-trustworthy participants, while for privacy, we aim to ensure that all participants adhere to the procedure (trustworthy). However, specific attacks, like FSHA [27], fall out of the category of semi-honest attacks as they involve modifications to the procedure that can affect the learning objectives. These attacks pose significant challenges to privacy preservation in SL, and they may necessitate a reevaluation of the SL design from scratch, as suggested by some authors or researchers.
In conclusion, this paper presents a survey of potential data leakage in the recently emerged field of SL. We review various types of attacks on smashed data that can reveal raw input data, including visualisation, model inversion, and feature hijacking. Additionally, we summarise a range of defence mechanisms and analyse their efficacy. However, it’s important to recognise that there are trade-offs between accuracy and privacy, which require careful consideration by practitioners. This survey focuses specifically on data leakage concerns and serves as a foundational study for a deeper exploration of various privacy attacks in SL.

References

[1]
Sharif Abuadbba, Kyuyeon Kim, Minki Kim, Chandra Thapa, Seyit A. Camtepe, Yansong Gao, Hyoungshick Kim, and Surya Nepal. 2020. Can We Use Split Learning on 1D CNN Models for Privacy Preserving Training?. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security. 305–318.
[2]
Abbas Acar, Hidayet Aksu, A. Selcuk Uluagac, and Mauro Conti. 2018. A Survey on Homomorphic Encryption Schemes: Theory and Implementation. Comput. Surveys 51, 4 (2018), 35 pages.
[3]
Jose Cabrero-Holgueras and Sergio Pastrana. 2021. SoK: Privacy-Preserving Computation Techniques for Deep Learning. Proceedings on Privacy Enhancing Technologies 2021, 4 (2021), 139–162.
[4]
Zhuoqing Chang, Shubo Liu, Xingxing Xiong, Zhaohui Cai, and Guoqing Tu. 2021. A Survey of Recent Advances in Edge-Computing-Powered Artificial Intelligence of Things. IEEE Internet of Things Journal 8, 18 (2021), 13849–13875.
[5]
Qiang Duan, Shijing Hu, Ruijun Deng, and Zhihui Lu. 2022. Combined Federated and Split Learning in Edge Computing for Ubiquitous Intelligence in Internet of Things: State-of-the-Art and Future Directions. Sensors 22, 16 (2022).
[6]
Cynthia Dwork. 2008. Differential Privacy: A Survey of Results. In Theory and Applications of Models of Computation, Manindra Agrawal, Dingzhu Du, Zhenhua Duan, and Angsheng Li (Eds.). 1–19.
[7]
Ege Erdogan, Alptekin Kupcu, and A. Ercument Cicek. 2022. SplitGuard: Detecting and Mitigating Training-Hijacking Attacks in Split Learning. In Proceedings of the 21st Workshop on Privacy in the Electronic Society. 125–137.
[8]
Ege Erdogan, Alptekin Kupcu, and A. Ercument Cicek. 2022. UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks against Split Learning. In Proceedings of the 21st Workshop on Privacy in the Electronic Society. 115–124.
[9]
Ulfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. 1054–1067.
[10]
Jiayun Fu, Xiaojing Ma, Bin B. Zhu, Pingyi Hu, Ruixin Zhao, Yaru Jia, Peng Xu, Hai Jin, and Dongmei Zhang. 2023. Focusing on Pinocchio’s Nose: A Gradients Scrutinizer to Thwart Split-Learning Hijacking Attacks Using Intrinsic Attributes. In Network and Distributed System Security Symposium. 18 pages.
[11]
Xinben Gao and Lan Zhang. 2023. PCAT: Functionality and Data Stealing from Split Learning by Pseudo-Client Attack.
[12]
Grzegorz Gawron and Philip Stubbings. 2022. Feature Space Hijacking Attacks against Differentially Private Split Learning.
[13]
Otkrist Gupta and Ramesh Raskar. 2018. Distributed learning of deep neural network over multiple agents. Journal of Network and Computer Applications 116 (2018), 8 pages.
[14]
Zecheng He, Tianwei Zhang, and Ruby B. Lee. 2019. Model Inversion Attacks against Collaborative Inference. In Proceedings of the 35th Annual Computer Security Applications Conference. 148–162.
[15]
Tanveer Khan, Khoa Nguyen, and Antonis Michalas. 2023. Split Ways: Privacy-Preserving Training of Encrypted Data Using Split Learning. In 5th International Workshop on Health Data Management in the Era of AI. 8 pages.
[16]
Tanveer Khan, Khoa Nguyen, and Antonis Michalas. 2024. A More Secure Split: Enhancing the Security of Privacy-Preserving Split Learning. In Secure IT Systems. 307–329.
[17]
Tanveer Khan, Khoa Nguyen, Antonis Michalas, and Alexandros Bakas. 2023. Love or Hate? Share or Split? Privacy-Preserving Training Using Split Learning and Homomorphic Encryption.
[18]
Sunder Ali Khowaja, Ik Hyun Lee, Kapal Dev, Muhammad Aslam Jarwar, and Nawab Muhammad Faseeh Qureshi. 2022. Get Your Foes Fooled: Proximal Gradient Split Learning for Defense Against Model Inversion Attacks on IoMT Data. IEEE Transactions on Network Science and Engineering (2022), 1–10.
[19]
Jakub Konecny, H. Brendan McMahan, Felix X. Yu, Peter Richtarik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated Learning: Strategies for Improving Communication Efficiency. In NIPS Workshop on Private Multi-Party Machine Learning.
[20]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
[21]
Jingtao Li, Adnan Siraj Rakin, Xing Chen, Zhezhi He, Deliang Fan, and Chaitali Chakrabarti. 2022. ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10194–10202.
[22]
Yehuda Lindell. 2020. Secure Multiparty Computation. Communication of the ACM 64, 1 (2020), 86–96.
[23]
Pathum Chamikara Mahawaga Arachchige, Peter Bertok, Ibrahim Khalil, Dongxi Liu, Seyit Camtepe, and Mohammed Atiquzzaman. 2020. Local Differential Privacy for Deep Learning. IEEE Internet of Things Journal 7, 7 (2020), 5827–5842.
[24]
Fatemehsadat Mireshghallah, Mohammadkazem Taram, Prakash Ramrakhyani, Ali Jalali, Dean Tullsen, and Hadi Esmaeilzadeh. 2020. Shredder: Learning Noise Distributions to Protect Inference Privacy. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 3–18.
[25]
Viraaji Mothukuri, Reza M. Parizi, Seyedamin Pouriyeh, Yan Huang, Ali Dehghantanha, and Gautam Srivastava. 2021. A survey on security and privacy of federated learning. Future Generation Computer Systems 115 (2021), 619–640.
[26]
Khoa Nguyen, Tanveer Khan, and Antonis Michalas. 2023. Split Without a Leak: Reducing Privacy Leakage in Split Learning.
[27]
Dario Pasquini, Giuseppe Ateniese, and Massimo Bernaschi. 2021. Unleashing the Tiger: Inference Attacks on Split Learning. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2113–2129.
[28]
George-Liviu Pereteanu, Amir Alansary, and Jonathan Passerat-Palmbach. 2022. Split HE: Fast Secure Inference Combining Split Learning and Homomorphic Encryption.
[29]
Ngoc Duy Pham, Alsharif Abuadbba, Yansong Gao, Khoa Tran Phan, and Naveen Chilamkurti. 2023. Binarizing Split Learning for Data Privacy Enhancement and Computation Reduction. IEEE Transactions on Information Forensics and Security 18 (2023), 3088–3100.
[30]
Ngoc Duy Pham, Khoa Tran Phan, and Naveen Chilamkurti. 2023. Enhancing Accuracy-Privacy Trade-off in Differentially Private Split Learning.
[31]
Ngoc Duy Pham, Tran Khoa Phan, Alsharif Abuadbba, Doan Nguyen, and Naveen Chilamkurti. 2022. Split Learning without Local Weight Sharing to Enhance Client-side Data Privacy.
[32]
Pengyu Qiu, Xuhong Zhang, Shouling Ji, Yuwen Pu, and Ting Wang. 2022. All You Need Is Hashing: Defending Against Data Reconstruction Attack in Vertical Federated Learning.
[33]
Jihyeon Ryu, Yifeng Zheng, Yansong Gao, Alsharif Abuadbba, Junyaup Kim, Dongho Won, Surya Nepal, Hyoungshick Kim, and Cong Wang. 2022. Can differential privacy practically protect collaborative deep learning inference for IoT? Wireless Networks (2022).
[34]
Abhishek Singh, Ayush Chopra, Ethan Garza, Emily Zhang, Praneeth Vepakomma, Vivek Sharma, and Ramesh Raskar. 2021. DISCO: Dynamic and Invariant Sensitive Channel Obfuscation for Deep Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12125–12135.
[35]
Chandra Thapa, M. A. P. Chamikara, and Seyit A. Camtepe. 2021. Advancements of Federated Learning Towards Privacy Preservation: From Federated Learning to Split Learning. 79–109.
[36]
Chandra Thapa, Chamikara Mahawaga Arachchige, Seyit Camtepe, and Lichao Sun. 2022. SplitFed: When Federated Learning Meets Split Learning. In 36th AAAI Conference on Artificial Intelligence. 8 pages.
[37]
Tom Titcombe, Adam J. Hall, Pavlos Papadopoulos, and Daniele Romanini. 2021. Practical Defences Against Model Inversion Attacks for Split Neural Networks.
[38]
Nam–Phuong Tran, Nhu–Ngoc Dao, The-Vi Nguyen, and Sungrae Cho. 2022. Privacy-Preserving Learning Models for Communication: A tutorial on Advanced Split Learning. In 2022 13th International Conference on Information and Communication Technology Convergence. 1059–1064.
[39]
Nguyen Truong, Kai Sun, Siyao Wang, Florian Guitton, and YiKe Guo. 2021. Privacy preservation in federated learning: An insightful survey from the GDPR perspective. Computers & Security 110 (2021), 102402.
[40]
Valeria Turina, Zongshun Zhang, Flavio Esposito, and Ibrahim Matta. 2021. Federated or Split? A Performance and Privacy Analysis of Hybrid Split and Federated Learning Architectures. In 2021 IEEE 14th International Conference on Cloud Computing. 250–260.
[41]
Praneeth Vepakomma, Julia Balla, and Ramesh Raskar. 2022. PrivateMail: Supervised Manifold Learning of Deep Features with Privacy for Image Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence 36, 8 (2022), 8503–8511.
[42]
Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, and Ramesh Raskar. 2018. Split learning for health: Distributed deep learning without sharing raw patient data.
[43]
Praneeth Vepakomma and Ramesh Raskar. 2022. Split Learning: A Resource Efficient Model and Data Parallel Approach for Distributed Deep Learning. 439–451.
[44]
P. Vepakomma, A. Singh, O. Gupta, and R. Raskar. 2020. NoPeek: Information leakage reduction to share activations in distributed deep learning. In 2020 International Conference on Data Mining Workshops. 933–942.
[45]
Praneeth Vepakomma, Abhishek Singh, Emily Zhang, Otkrist Gupta, and Ramesh Raskar. 2021. NoPeek-Infer: Preventing face reconstruction attacks in distributed inference after on-premise training. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition. 1–8.
[46]
Chun-Hsien Yu, Chun-Nan Chou, and Emily Chang. 2019. Distributed Layer-Partitioned Training for Privacy-Preserved Deep Learning. In 2019 IEEE Conference on Multimedia Information Processing and Retrieval. 343–346.
[47]
Ligeng Zhu, Zhijian Liu, and Song Han. 2019. Deep Leakage from Gradients. In Advances in Neural Information Processing Systems, Vol. 32.
[48]
Xiaochen Zhu, Xinjian Luo, Yuncheng Wu, Yangfan Jiang, Xiaokui Xiao, and Beng Chin Ooi. 2023. Passive Inference Attacks on Split Learning via Adversarial Regularization.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICEA '23: Proceedings of the 2023 International Conference on Intelligent Computing and Its Emerging Applications
December 2023
175 pages
ISBN:9798400709050
DOI:10.1145/3659154
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 December 2024

Check for updates

Author Tags

  1. Data Leakage
  2. Split Learning
  3. Privacy Attack
  4. Privacy Preservation

Qualifiers

  • Research-article

Conference

ICEA 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 245
    Total Downloads
  • Downloads (Last 12 months)245
  • Downloads (Last 6 weeks)122
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media