3.1 Attacks strategies
Data leakage in SL can occur when the raw input data is inferred directly from the exposed smashed data through straightforward invertibility or more sophisticated reconstruction techniques.
Visual invertibility attack. Abuadbba et al. [
1] firstly introduce SL to a one-dimensional Convolutional Neural Network (1D-CNN) and demonstrate that the smashed data sent from the client to the server still retains a significant amount of information about the original data. The smashed data exhibits a high degree of similarity to the original input, suggesting the potential for substantial leakage. This potential leakage is quantified using distance correlation and dynamic time-warping metrics, affirming the high risk of its exploitation for reconstructing the raw input data. A similar observation can be found in the research conducted by Pham et al. [
29], where the authors identify potential data leakage in CNN-based SL applied to 2D-image data. Fig.
3 illustrates the potential leakage of raw data at the split layer in a typical 2D-CNN. The visually plotted feature maps from a channel in the first convolution and sub-sampling layers closely resemble the raw input image.
Feature-space hijacking attack. In [
27], Pasquini et al. introduce a novel attack strategy known as the Feature-Space Hijack Attack (FSHA) that empowers a malicious server to regain access to private input data during the training process of SL. FSHA involves the server assuming control over the learning process of client models, steering them into a vulnerable state that can be exploited for the inference of input data. The authors recognise that a server-based attacker holds the potential to manipulate the direction of optimisation for the client model by influencing the training process. Consequently, they develop FSHA with the aim of achieving high-quality reconstruction of the client’s private data. However, the effectiveness of this attack is contingent on having access to a substantial volume of training data that aligns with the distribution of the client’s data.
Model inversion attacks. Diverging from FSHA, the model inversion attack, as outlined by Erdogan et al. in [
8], pursues a distinct objective. This attack strategy aims to obtain a functionally equivalent model to the client’s model and access the raw training data without relying on any prior knowledge of the client’s dataset. The sole assumption underlying this attack is that the attacker possesses knowledge of the client-side model’s architecture. Deprived of data that resembles the training data and incapable of querying the client model, the attacker’s task transforms into a quest spanning the entire space of potential input values and client model parameters. The attacker rigorously adheres to the SL protocol and exclusively necessitates access to the smashed data to execute the model inversion and data inference. This aspect makes it challenging for clients to detect such an attack. In [
14], He et al. explore various attack scenarios: (i) the white-box scenario, where the attacker has access to the local model and uses it to reconstruct the images; (ii) the black-box scenario, where the attacker lacks knowledge of the local model but can query it to recreate a similar one; and (iii) the query-free scenario, where the attacker cannot query the client but aims to construct a substitute model for data reconstruction. The last scenario yields the least favourable results, as expected, given the limited capabilities of the attacker. Additionally, the architecture of the model and the division of layers between the client and the server influence the quality of reconstruction. Having fewer layers in the client generally leads to better reconstruction by the centralised server.
In another approach, Gao and Zhang [
11] propose a passive inference attack named Pseudo-Client ATtack (PCAT), in which the server adheres to the SL training protocol but attempts to infer the private data of the clients by analysing the exposed smashed data. The attacker only needs access to a small amount of training data to develop a mechanism for data reconstruction comparable to FSHA. Notably, PCAT does not disrupt the primary training process, making it challenging to detect. While previous attempts at attacks often rely on strong assumptions or targeted easily exploitable models, Zhu et al. introduce a more practical approach in [
48]. They present Simulator Decoding with Adversarial Regularisation (SDAR), which leverages auxiliary data and adversarial regularisation to learn a decodable simulator of the client’s private model. SDAR, when applied against SL with a semi-trusted server, can effectively infer the client’s private features in vanilla SL, and both features and labels in u-shaped SL.
3.2 Defence approaches
Data encryption. One potential solution is to utilise privacy-preserving techniques - to encrypt the model and data, allowing different organisations to use a model held by another organisation without revealing their proprietary information. Two well-known techniques that enable computations over encrypted data, while preserving privacy are Homomorphic Encryption (HE) [
2] and Secure Multi-Party Computation (SMPC) [
22]. Both of these techniques appear to be promising solutions, as they enable computations on encrypted data without disclosing the underlying information. However, there are underlying challenges that can complicate their implementation, such as computational complexity in HE and communication costs in SMPC [
3].
While numerous Privacy-preserving ML works employ HE to protect users’ inputs, there are relatively few works that combine HE with SL. Pereteanu et al. [
28] propose a method in which the server model part is divided into private sections separated by a public section accessible in plain text by the client to expedite classification while utilising HE. This approach is limited to client input classification and does not allow a client to customise a model part for their private dataset. Recently, Khan et al. [
15] introduce an approach that combines SL and HE, where the client encrypts the smashed data before sending it to the server. However, one limitation of this hybrid approach is that during backward propagation, a server can extract valuable information about the client’s input data, by exploiting the gradients sent from the client, potentially leading to privacy breaches. In response to this concern, Nguyen et al. [
26] propose an enhanced protocol to mitigate this data leakage issue in [
15], offering improved speed and reducing communication overhead.
More recently, Khan et al. [
16,
17] devise a protocol that allows u-shaped SL to operate on homomorphically encrypted data. In their approach, the client applies HE to the smashed data before sending it to the server, effectively protecting user data privacy. However, it’s noted that their work is primarily focused on 1D time-series data, such as ECG signals, and is limited to a single client. Extending this approach to multiple clients, which would necessitate the use of a multi-key HE scheme, is left for future work by the authors.
Data decorrelation. In response to the potential data leakage from SL smashed data, Abuadbba et al. [
1] explore a strategy to mitigate privacy risks by introducing additional hidden layers to the local model, specifically, by adding more convolutional layers to the client before the split layer. This approach results in a more complex model architecture while maintaining a constant number of layers held by the server. Their evaluation reveals a slight reduction in the distance correlation between the input and smashed data as the number of hidden convolution layers increases. However, some highly correlated channels still remain, indicating the possibility of significant leakage and the potential for raw data reconstruction.
Another approach, as proposed in [
44,
45] by Vepakomma et al., aims to bolster privacy safeguards within SL by introducing a loss term based on distance correlation into the overall loss function. Distance correlation (DC), serving as a metric for assessing the statistical interdependence among random variables, is utilised to minimise the correlation between the original input and the smashed data. The optimisation of both these loss terms, including the standard loss function and the DC loss term, is designed to reduce the amount of information present in the smashed data for potential raw data reconstruction, while still preserving the model’s accuracy. It’s important to highlight that introducing the additional DC to the server’s loss function could pose privacy risks, potentially enabling attackers to reconstruct the original input data if they possess both the DC value and access to the smashed data transmitted over the network. To address this concern, Turina et al. [
40] introduce a client-based privacy protection method integrated into a hybrid FL-SL framework. This innovative approach employs two distinct loss functions, one dedicated to clients and the other to the server. The first loss function prioritises privacy aspects and includes elements like DC or differential privacy, functioning exclusively on the client side. Conversely, the second (global) loss function is calculated on the server and extends its influence over both clients and the server throughout the training process. Empirical evidence underscores the effectiveness of this approach in maintaining data privacy in both hybrid FL-SL and parallel SL setups. Moreover, it becomes evident that the client-based privacy approach, employing DC, outperforms the noise-based approach in achieving a balance between privacy and model accuracy.
Another engineering-driven approach to minimise the information transmitted in SL is through the selective pruning of channels in the client-side smashed data, as demonstrated by Singh et al. in [
34]. Learning a pruning filter to selectively remove channels in the latent representation space at the split layer is empirically shown to prevent various state-of-the-art reconstruction attacks during the prediction step in private collaborative inference scenarios.
Quantisation provides another avenue for decorrelating input and smashed data. Yu et al. [
46] introduce the Stepwise activation function to render activation outputs irreversible, with the effectiveness of this approach dependent on the Stepwise parameters, exhibiting trade-offs between accuracy and privacy preservation. An extreme quantised approach, known as Binarised SL (B-SL), proposed by Pham et al. in [
29], binarising the local SL model including the smashed data that exposes to the server. The process of binarisation introduces latent noise into the smashed data, effectively diminishing the server’s capacity to reconstruct the original training data. Furthermore, the authors incorporate an extra loss term alongside the standard model accuracy loss, aiming to minimise the leakage of locally sensitive data. Note that, the loss term in the B-SL framework is versatile and not restricted to the DC term used in [
44]. Additionally, the authors provide three methods for the implementation of differential privacy within the B-SL framework to ensure privacy guarantees. Experimental results reported in [
29] demonstrate the effectiveness of B-SL in mitigating privacy vulnerabilities under FSHA attacks.
In a different approach, Qiu et al. [
32] recommend the adoption of hashing as a protective measure against reconstruction attacks. Their approach entails implementing the Sign function on the smashed data before sending the outcomes to the server, making data reconstruction exceedingly challenging. To preserve the model’s trainability with the Sign function, the authors leverage techniques like batch normalisation and the straight-through estimator. Both of these methods contribute to reinforcing the defence against reconstruction attacks while upholding high accuracy.
Noise-based mechanisms. Noise-based mechanisms offer a defence strategy that avoids the heavy computational burden of cryptographic primitives. The mechanisms involve adaptively injecting noise into smashed data while retaining the server’s ability to perform tasks. Noise is treated as an additional set of trainable parameter probabilities, which can be gradually eliminated through end-to-end self-supervised training [
4]. For instance, Shredder [
24] proposed by Mireshghallah et al. achieves an asymmetric balance between accuracy and privacy by adding noise as part of the gradient-based learning process, effectively reducing the information content of smashed data sent by clients to servers for inference. Similarly, Abuadbba et al. [
1] and Titcombe et al. [
37] apply noise to the smashed data before transmitting it to the server, framing this defence as a differential privacy (DP) mechanism [
6].
In another approach, Mahawaga Arachchige et al. [
23] provide a differentially private mechanism for sharing activations following a flattening layer that comes after convolutional and pooling layers. These flattened outputs are binarised, and a utility-enhanced randomisation mechanism, inspired by RAPPOR [
9], is applied to create a differentially private binary representation. These binary representations are then sent to the server, where fully connected layers perform final predictions. Vepakomma et al. [
41], by integrating DP, propose PrivateMail, a differentially private mechanism for supervised manifold embeddings of features extracted from deep networks for image retrieval tasks. PrivateMail is claimed to achieve a substantially improved balance between privacy and utility compared to several baselines. More recently, Ryu et al. [
33] conduct a systematic study to assess the effectiveness of DP in collaborative inference against reconstruction attacks. In summary, noise-based mechanisms can defend against data leakage, either directly from smashed data or from reconstruction attacks. However, it’s important to note that the addition of noise can significantly impact the model’s accuracy, even with modest levels of noise [
37].
Protecting from model inversion attacks. DP, while effective in preventing data leakage, often comes at the cost of model accuracy. Recently, Pham et al. in [
31] develop a new SL framework in which client-side data privacy is enhanced without the need for sequential data sharing between clients. By disallowing the sharing of local models among clients, the risk of local models being inverted for data reconstruction is reduced. The authors demonstrate that this non-local-sharing SL can reduce leakage due to model inversion attacks by half, though attackers can still attempt to reconstruct private data by modifying the Deep Leakage attack [
47].
To safeguard hybrid SL-FL from model inversion threats, Li et al. [
21] propose a model inversion-resistant framework called ResSFL. ResSFL involves two key steps: an initial pre-training phase that constructs a feature extractor designed to withstand model inversion, followed by a subsequent resistance transfer phase that employs this feature extractor to initialise client-side models. During the pre-training phase, an attacker-aware training technique is employed, mimicking an attacker with a robust inversion model and introducing bottleneck layers to limit the feature space. Typically, this pre-training is conducted by an expert, often a powerful client or a third party with sufficient computational resources. In the second phase, the robust feature extractor is utilised to initialise the SL-FL training scheme for a new task. In another study, Khowaja et al. [
18] propose a method that segments raw data into patches, rendering the recovery of the original data more challenging. The authors emphasise the growing concern regarding model security and suggest that their proximal gradient-based learning networks could effectively thwart model inversion attacks. Results from these studies indicate that the reconstructed version often fails to yield meaningful information.
Protecting from feature-space hijacking attacks. The division of the model into client and server parts in SL introduces a unique type of inference attack, enabling a malicious server to influence the client model’s training process and infer training data, as exemplified by the FSHA. Regarding defence mechanisms, Erdogan et al. [
7] argue that the direction of the client’s parameter in FSHA is unrelated to the primary task. Consequently, introducing a small amount of erroneous data during the training process enables clients to monitor changes in the gradient information provided by the server, assisting in the detection of any malicious behaviour. However, Fu et al. [
10] put forward an attack strategy aimed at circumventing this detection mechanism. They clarify that the malicious server in FSHA is fundamentally constructing an auto-encoder, a behaviour that clients can identify by comparing the expected model gradients with those of the auto-encoder.
Research conducted by Gawron and Stubbings [
12] highlights that DP might not provide adequate protection against FSHA. In their investigation, they apply FSHA to SL protected by DP using a client-side DP optimiser. The empirical findings suggest that while DP can delay the convergence of FSHA, this attack method still successfully reconstructs the client’s private data with a low margin of error at various DP settings. Furthermore, the authors explore the utilisation of dimensionality reduction techniques applied directly to the raw data before training as a privacy protection measure. This approach is found to partially mitigate FSHA but could impact model accuracy, especially when dealing with large datasets.