-
PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning
Authors:
Sizai Hou,
Songze Li,
Tayyebeh Jahani-Nezhad,
Giuseppe Caire
Abstract:
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data while preserving user privacy. However, the typical paradigm of FL faces challenges of both privacy and robustness: the transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the…
▽ More
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data while preserving user privacy. However, the typical paradigm of FL faces challenges of both privacy and robustness: the transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates. Current solutions attempting to address both problems under the one-server FL setting fall short in the following aspects: 1) designed for simple validity checks that are insufficient against advanced attacks (e.g., checking norm of individual update); and 2) partial privacy leakage for more complicated robust aggregation algorithms (e.g., distances between model updates are leaked for multi-Krum). In this work, we formalize a novel security notion of aggregated privacy that characterizes the minimum amount of user information, in the form of some aggregated statistics of users' updates, that is necessary to be revealed to accomplish more advanced robust aggregation. We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy. As concrete instantiations of PriRoAgg, we construct two secure and robust protocols based on state-of-the-art robust algorithms, for which we provide full theoretical analyses on security and complexity. Extensive experiments are conducted for these protocols, demonstrating their robustness against various model integrity attacks, and their efficiency advantages over baselines.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Fundamental Limits of Multi-Message Private Computation
Authors:
Ali Gholami,
Kai Wan,
Tayyebeh Jahani-Nezhad,
Hua Sun,
Mingyue Ji,
Giuseppe Caire
Abstract:
In a typical formulation of the private information retrieval (PIR) problem, a single user wishes to retrieve one out of $ K$ files from $N$ servers without revealing the demanded file index to any server. This paper formulates an extended model of PIR, referred to as multi-message private computation (MM-PC), where instead of retrieving a single file, the user wishes to retrieve $P>1$ linear comb…
▽ More
In a typical formulation of the private information retrieval (PIR) problem, a single user wishes to retrieve one out of $ K$ files from $N$ servers without revealing the demanded file index to any server. This paper formulates an extended model of PIR, referred to as multi-message private computation (MM-PC), where instead of retrieving a single file, the user wishes to retrieve $P>1$ linear combinations of files while preserving the privacy of the demand information. The MM-PC problem is a generalization of the private computation (PC) problem (where the user requests one linear combination of the files), and the multi-message private information retrieval (MM-PIR) problem (where the user requests $P>1$ files). A baseline achievable scheme repeats the optimal PC scheme by Sun and Jafar $P$ times, or treats each possible demanded linear combination as an independent file and then uses the near optimal MM-PIR scheme by Banawan and Ulukus. In this paper, we propose a new MM-PC scheme that significantly improves upon the baseline schemes. In doing so, we design the queries inspired by the structure in the cache-aided scalar linear function retrieval scheme by Wan {\it et al.}, which leverages the dependency between linear functions to reduce the amount of communications. To ensure the decodability of our scheme, we propose a new method to benefit from the existing dependency, referred to as the sign assignment step. In the end, we use Maximum Distance Separable matrices to code the queries, which allows the reduction of download from the servers, while preserving privacy. By the proposed schemes, we characterize the capacity within a multiplicative factor of $2$.
△ Less
Submitted 23 August, 2024; v1 submitted 9 May, 2023;
originally announced May 2023.
-
ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment
Authors:
Tayyebeh Jahani-Nezhad,
Mohammad Ali Maddah-Ali,
Giuseppe Caire
Abstract:
In this paper, we propose ByzSecAgg, an efficient secure aggregation scheme for federated learning that is protected against Byzantine attacks and privacy leakages. Processing individual updates to manage adversarial behavior, while preserving privacy of data against colluding nodes, requires some sort of secure secret sharing. However, the communication load for secret sharing of long vectors of…
▽ More
In this paper, we propose ByzSecAgg, an efficient secure aggregation scheme for federated learning that is protected against Byzantine attacks and privacy leakages. Processing individual updates to manage adversarial behavior, while preserving privacy of data against colluding nodes, requires some sort of secure secret sharing. However, the communication load for secret sharing of long vectors of updates can be very high. ByzSecAgg solves this problem by partitioning local updates into smaller sub-vectors and sharing them using ramp secret sharing. However, this sharing method does not admit bi-linear computations, such as pairwise distance calculations, needed by outlier-detection algorithms. To overcome this issue, each user runs another round of ramp sharing, with different embedding of data in the sharing polynomial. This technique, motivated by ideas from coded computing, enables secure computation of pairwise distance. In addition, to maintain the integrity and privacy of the local update, ByzSecAgg also uses a vector commitment method, in which the commitment size remains constant (i.e. does not increase with the length of the local update), while simultaneously allowing verification of the secret sharing process. In terms of communication loads, ByzSecAgg significantly outperforms the state-of-the-art scheme, known as BREA.
△ Less
Submitted 2 June, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
SwiftAgg+: Achieving Asymptotically Optimal Communication Loads in Secure Aggregation for Federated Learning
Authors:
Tayyebeh Jahani-Nezhad,
Mohammad Ali Maddah-Ali,
Songze Li,
Giuseppe Caire
Abstract:
We propose SwiftAgg+, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N \in \mathbb{N}$ distributed users, each of size $L \in \mathbb{N}$, trained on their local data, in a privacy-preserving manner. SwiftAgg+ can significantly reduce the communication overheads without any compromise on security, and achieve optimal communica…
▽ More
We propose SwiftAgg+, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N \in \mathbb{N}$ distributed users, each of size $L \in \mathbb{N}$, trained on their local data, in a privacy-preserving manner. SwiftAgg+ can significantly reduce the communication overheads without any compromise on security, and achieve optimal communication loads within diminishing gaps. Specifically, in presence of at most $D=o(N)$ dropout users, SwiftAgg+ achieves a per-user communication load of $(1+\mathcal{O}(\frac{1}{N}))L$ symbols and a server communication load of $(1+\mathcal{O}(\frac{1}{N}))L$ symbols, with a worst-case information-theoretic security guarantee, against any subset of up to $T=o(N)$ semi-honest users who may also collude with the curious server. Moreover, the proposed SwiftAgg+ allows for a flexible trade-off between communication loads and the number of active communication links. In particular, for $T<N-D$ and for any $K\in\mathbb{N}$, SwiftAgg+ can achieve the server communication load of $(1+\frac{T}{K})L$ symbols, and per-user communication load of up to $(1+\frac{T+D}{K})L$ symbols, where the number of pair-wise active connections in the network is $\frac{N}{2}(K+T+D+1)$.
△ Less
Submitted 8 September, 2022; v1 submitted 24 March, 2022;
originally announced March 2022.
-
SwiftAgg: Communication-Efficient and Dropout-Resistant Secure Aggregation for Federated Learning with Worst-Case Security Guarantees
Authors:
Tayyebeh Jahani-Nezhad,
Mohammad Ali Maddah-Ali,
Songze Li,
Giuseppe Caire
Abstract:
We propose SwiftAgg, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N$ distributed users, each of size $L$, trained on their local data, in a privacy-preserving manner. Compared with state-of-the-art secure aggregation protocols, SwiftAgg significantly reduces the communication overheads without any compromise on security. Spe…
▽ More
We propose SwiftAgg, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N$ distributed users, each of size $L$, trained on their local data, in a privacy-preserving manner. Compared with state-of-the-art secure aggregation protocols, SwiftAgg significantly reduces the communication overheads without any compromise on security. Specifically, in presence of at most $D$ dropout users, SwiftAgg achieves a users-to-server communication load of $(T+1)L$ and a users-to-users communication load of up to $(N-1)(T+D+1)L$, with a worst-case information-theoretic security guarantee, against any subset of up to $T$ semi-honest users who may also collude with the curious server. The key idea of SwiftAgg is to partition the users into groups of size $D+T+1$, then in the first phase, secret sharing and aggregation of the individual models are performed within each group, and then in the second phase, model aggregation is performed on $D+T+1$ sequences of users across the groups. If a user in a sequence drops out in the second phase, the rest of the sequence remain silent. This design allows only a subset of users to communicate with each other, and only the users in a single group to directly communicate with the server, eliminating the requirements of 1) all-to-all communication network across users; and 2) all users communicating with the server, for other secure aggregation protocols. This helps to substantially slash the communication costs of the system.
△ Less
Submitted 29 April, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Optimal Communication-Computation Trade-Off in Heterogeneous Gradient Coding
Authors:
Tayyebeh Jahani-Nezhad,
Mohammad Ali Maddah-Ali
Abstract:
Gradient coding allows a master node to derive the aggregate of the partial gradients, calculated by some worker nodes over the local data sets, with minimum communication cost, and in the presence of stragglers. In this paper, for gradient coding with linear encoding, we characterize the optimum communication cost for heterogeneous distributed systems with \emph{arbitrary} data placement, with…
▽ More
Gradient coding allows a master node to derive the aggregate of the partial gradients, calculated by some worker nodes over the local data sets, with minimum communication cost, and in the presence of stragglers. In this paper, for gradient coding with linear encoding, we characterize the optimum communication cost for heterogeneous distributed systems with \emph{arbitrary} data placement, with $s \in \mathbb{N}$ stragglers and $a \in \mathbb{N}$ adversarial nodes. In particular, we show that the optimum communication cost, normalized by the size of the gradient vectors, is equal to $(r-s-2a)^{-1}$, where $r \in \mathbb{N}$ is the minimum number that a data partition is replicated. In other words, the communication cost is determined by the data partition with the minimum replication, irrespective of the structure of the placement. The proposed achievable scheme also allows us to target the computation of a polynomial function of the aggregated gradient matrix. It also allows us to borrow some ideas from approximation computing and propose an approximate gradient coding scheme for the cases when the repetition in data placement is smaller than what is needed to meet the restriction imposed on communication cost or when the number of stragglers appears to be more than the presumed value in the system design.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Berrut Approximated Coded Computing: Straggler Resistance Beyond Polynomial Computing
Authors:
Tayyebeh Jahani-Nezhad,
Mohammad Ali Maddah-Ali
Abstract:
One of the major challenges in using distributed learning to train complicated models with large data sets is to deal with stragglers effect. As a solution, coded computation has been recently proposed to efficiently add redundancy to the computation tasks. In this technique, coding is used across data sets, and computation is done over coded data, such that the results of an arbitrary subset of w…
▽ More
One of the major challenges in using distributed learning to train complicated models with large data sets is to deal with stragglers effect. As a solution, coded computation has been recently proposed to efficiently add redundancy to the computation tasks. In this technique, coding is used across data sets, and computation is done over coded data, such that the results of an arbitrary subset of worker nodes with a certain size are enough to recover the final results. The major challenges with those approaches are (1) they are limited to polynomial function computations, (2) the size of the subset of servers that we need to wait for grows with the multiplication of the size of the data set and the model complexity (the degree of the polynomial), which can be prohibitively large, (3) they are not numerically stable for computation over real numbers. In this paper, we propose Berrut Approximated Coded Computing (BACC), as an alternative approach, which is not limited to polynomial function computation. In addition, the master node can approximately calculate the final results, using the outcomes of any arbitrary subset of available worker nodes. The approximation approach is proven to be numerically stable with low computational complexity. In addition, the accuracy of the approximation is established theoretically and verified by simulation results in different settings such as distributed learning problems. In particular, BACC is used to train a deep neural network on a cluster of servers, which outperforms repetitive computation (repetition coding) in terms of the rate of convergence.
△ Less
Submitted 1 November, 2021; v1 submitted 17 September, 2020;
originally announced September 2020.
-
CodedSketch: A Coding Scheme for Distributed Computation of Approximated Matrix Multiplication
Authors:
Tayyebeh Jahani-Nezhad,
Mohammad Ali Maddah-Ali
Abstract:
In this paper, we propose CodedSketch, as a distributed straggler-resistant scheme to compute an approximation of the multiplication of two massive matrices. The objective is to reduce the recovery threshold, defined as the total number of worker nodes that we need to wait for to be able to recover the final result. To exploit the fact that only an approximated result is required, in reducing the…
▽ More
In this paper, we propose CodedSketch, as a distributed straggler-resistant scheme to compute an approximation of the multiplication of two massive matrices. The objective is to reduce the recovery threshold, defined as the total number of worker nodes that we need to wait for to be able to recover the final result. To exploit the fact that only an approximated result is required, in reducing the recovery threshold, some sorts of pre-compression are required. However, compression inherently involves some randomness that would lose the structure of the matrices. On the other hand, considering the structure of the matrices is crucial to reduce the recovery threshold. In CodedSketch, we use count--sketch, as a hash-based compression scheme, on the rows of the first and columns of the second matrix, and a structured polynomial code on the columns of the first and rows of the second matrix. This arrangement allows us to exploit the gain of both in reducing the recovery threshold. To increase the accuracy of computation, multiple independent count--sketches are needed. This independency allows us to theoretically characterize the accuracy of the result and establish the recovery threshold achieved by the proposed scheme. To guarantee the independency of resulting count--sketches in the output, while keeping its cost on the recovery threshold minimum, we use another layer of structured codes.
△ Less
Submitted 12 February, 2021; v1 submitted 26 December, 2018;
originally announced December 2018.
-
Performance Analysis of Molecular Spatial Modulation (MSM) in Diffusion based Molecular MIMO Communication Systems
Authors:
Tayyebeh Jahani-Nezhad,
Foroogh S. Tabataba
Abstract:
In diffusion-based molecular communication, information is transferred from a transmitter to a receiver using molecular carriers. The low achievable data rate is the main disadvantage of diffusion-based molecular over radio-based communication. One solution to overcome this disadvantage is molecular MIMO communication. In this paper, we introduce molecular spatial modulation (MSM) in molecular MIM…
▽ More
In diffusion-based molecular communication, information is transferred from a transmitter to a receiver using molecular carriers. The low achievable data rate is the main disadvantage of diffusion-based molecular over radio-based communication. One solution to overcome this disadvantage is molecular MIMO communication. In this paper, we introduce molecular spatial modulation (MSM) in molecular MIMO communication to increase the data rate of the system. Also, special detection methods are used, all of which are based on the threshold level detection method. They use diversity techniques in molecular communication systems if the channel matrix that we introduce is full rank. Also, for a 2$\times$1 system, we define an optimization problem to obtain the suitable number of molecules for transmitting to reduce BER of this systems. Then the proposed modulation is generalized to $2\times2$ and $4\times4$ systems. In each of these systems, special detection methods based on the threshold level detection are used. Finally, based on BER, systems using MSM are fairly compared to the systems that have similar data rates. The simulation results show that the proposed modulation and detection methods reduce BER. Whereas the proposed methods are very simple and practical for molecular systems.
△ Less
Submitted 16 September, 2018;
originally announced September 2018.