Search | arXiv e-print repository

Order Optimal Bounds for One-Shot Federated Learning over non-Convex Loss Functions

Authors: Arsalan Sharifnassab, Saber Salehkaleybar, S. Jamaloddin Golestani

Abstract: We consider the problem of federated learning in a one-shot setting in which there are $m$ machines, each observing $n$ sample functions from an unknown distribution on non-convex loss functions. Let $F:[-1,1]^d\to\mathbb{R}$ be the expected loss function with respect to this unknown distribution. The goal is to find an estimate of the minimizer of $F$. Based on its observations, each machine gene… ▽ More We consider the problem of federated learning in a one-shot setting in which there are $m$ machines, each observing $n$ sample functions from an unknown distribution on non-convex loss functions. Let $F:[-1,1]^d\to\mathbb{R}$ be the expected loss function with respect to this unknown distribution. The goal is to find an estimate of the minimizer of $F$. Based on its observations, each machine generates a signal of bounded length $B$ and sends it to a server. The server collects signals of all machines and outputs an estimate of the minimizer of $F$. We show that the expected loss of any algorithm is lower bounded by $\max\big(1/(\sqrt{n}(mB)^{1/d}), 1/\sqrt{mn}\big)$, up to a logarithmic factor. We then prove that this lower bound is order optimal in $m$ and $n$ by presenting a distributed learning algorithm, called Multi-Resolution Estimator for Non-Convex loss function (MRE-NC), whose expected loss matches the lower bound for large $mn$ up to polylogarithmic factors. △ Less

Submitted 6 February, 2024; v1 submitted 19 August, 2021; originally announced August 2021.

arXiv:2102.08814 [pdf, other]

Distributed Fair Scheduling for Information Exchange in Multi-Agent Systems

Authors: Majid Raeis, S. Jamaloddin Golestani

Abstract: Information exchange is a crucial component of many real-world multi-agent systems. However, the communication between the agents involves two major challenges: the limited bandwidth, and the shared communication medium between the agents, which restricts the number of agents that can simultaneously exchange information. While both of these issues need to be addressed in practice, the impact of th… ▽ More Information exchange is a crucial component of many real-world multi-agent systems. However, the communication between the agents involves two major challenges: the limited bandwidth, and the shared communication medium between the agents, which restricts the number of agents that can simultaneously exchange information. While both of these issues need to be addressed in practice, the impact of the latter problem on the performance of the multi-agent systems has often been neglected. This becomes even more important when the agents' information or observations have different importance, in which case the agents require different priorities for accessing the medium and sharing their information. Representing the agents' priorities by fairness weights and normalizing each agent's share by the assigned fairness weight, the goal can be expressed as equalizing the agents' normalized shares of the communication medium. To achieve this goal, we adopt a queueing theoretic approach and propose a distributed fair scheduling algorithm for providing weighted fairness in single-hop networks. Our proposed algorithm guarantees an upper-bound on the normalized share disparity among any pair of agents. This can particularly improve the short-term fairness, which is important in real-time applications. Moreover, our scheduling algorithm adjusts itself dynamically to achieve a high throughput at the same time. The simulation results validate our claims and comparisons with the existing methods show our algorithm's superiority in providing short-term fairness, while achieving a high throughput. △ Less

Submitted 17 February, 2021; originally announced February 2021.

Comments: 9 pages, Accepted at ICAPS 2021

arXiv:1911.00731 [pdf, other]

Order Optimal One-Shot Distributed Learning

Authors: Arsalan Sharifnassab, Saber Salehkaleybar, S. Jamaloddin Golestani

Abstract: We consider distributed statistical optimization in one-shot setting, where there are $m$ machines each observing $n$ i.i.d. samples. Based on its observed samples, each machine then sends an $O(\log(mn))$-length message to a server, at which a parameter minimizing an expected loss is to be estimated. We propose an algorithm called Multi-Resolution Estimator (MRE) whose expected error is no larger… ▽ More We consider distributed statistical optimization in one-shot setting, where there are $m$ machines each observing $n$ i.i.d. samples. Based on its observed samples, each machine then sends an $O(\log(mn))$-length message to a server, at which a parameter minimizing an expected loss is to be estimated. We propose an algorithm called Multi-Resolution Estimator (MRE) whose expected error is no larger than $\tilde{O}\big(m^{-{1}/{\max(d,2)}} n^{-1/2}\big)$, where $d$ is the dimension of the parameter space. This error bound meets existing lower bounds up to poly-logarithmic factors, and is thereby order optimal. The expected error of MRE, unlike existing algorithms, tends to zero as the number of machines ($m$) goes to infinity, even when the number of samples per machine ($n$) remains upper bounded by a constant. This property of the MRE algorithm makes it applicable in new machine learning paradigms where $m$ is much larger than $n$. △ Less

Submitted 2 November, 2019; originally announced November 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1905.04634

Journal ref: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

arXiv:1905.04634 [pdf, other]

One-Shot Federated Learning: Theoretical Limits and Algorithms to Achieve Them

Authors: Saber Salehkaleybar, Arsalan Sharifnassab, S. Jamaloddin Golestani

Abstract: We consider distributed statistical optimization in one-shot setting, where there are $m$ machines each observing $n$ i.i.d. samples. Based on its observed samples, each machine sends a $B$-bit-long message to a server. The server then collects messages from all machines, and estimates a parameter that minimizes an expected convex loss function. We investigate the impact of communication constrain… ▽ More We consider distributed statistical optimization in one-shot setting, where there are $m$ machines each observing $n$ i.i.d. samples. Based on its observed samples, each machine sends a $B$-bit-long message to a server. The server then collects messages from all machines, and estimates a parameter that minimizes an expected convex loss function. We investigate the impact of communication constraint, $B$, on the expected error and derive a tight lower bound on the error achievable by any algorithm. We then propose an estimator, which we call Multi-Resolution Estimator (MRE), whose expected error (when $B\ge\log mn$) meets the aforementioned lower bound up to poly-logarithmic factors, and is thereby order optimal. We also address the problem of learning under tiny communication budget, and present lower and upper error bounds when $B$ is a constant. The expected error of MRE, unlike existing algorithms, tends to zero as the number of machines ($m$) goes to infinity, even when the number of samples per machine ($n$) remains upper bounded by a constant. This property of the MRE algorithm makes it applicable in new machine learning paradigms where $m$ is much larger than $n$. △ Less

Submitted 30 December, 2019; v1 submitted 11 May, 2019; originally announced May 2019.

arXiv:1810.09180 [pdf, ps, other]

Fluctuation Bounds for the Max-Weight Policy, with Applications to State Space Collapse

Authors: Arsalan Sharifnassab, John N. Tsitsiklis, S. Jamaloddin Golestani

Abstract: We consider a multi-hop switched network operating under a Max-Weight (MW) scheduling policy, and show that the distance between the queue length process and a fluid solution remains bounded by a constant multiple of the deviation of the cumulative arrival process from its average. We then exploit this result to prove matching upper and lower bounds for the time scale over which additive state spa… ▽ More We consider a multi-hop switched network operating under a Max-Weight (MW) scheduling policy, and show that the distance between the queue length process and a fluid solution remains bounded by a constant multiple of the deviation of the cumulative arrival process from its average. We then exploit this result to prove matching upper and lower bounds for the time scale over which additive state space collapse (SSC) takes place. This implies, as two special cases, an additive SSC result in diffusion scaling under non-Markovian arrivals and, for the case of i.i.d. arrivals, an additive SSC result over an exponential time scale. △ Less

Submitted 12 June, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

arXiv:1707.01063 [pdf, ps, other]

doi 10.1016/j.sigpro.2020.107653

On the relaxed maximum-likelihood blind MIMO channel estimation for orthogonal space-time block codes

Authors: Kamran Kalbasi, S. Jamaloddin Golestani

Abstract: This paper concerns the maximum-likelihood channel estimation for MIMO systems with orthogonal space-time block codes when the finite alphabet constraint of the signal constellation is relaxed. We study the channel coefficients estimation subspace generated by this method. We provide an algebraic characterisation of this subspace which turns the optimization problem into a purely algebraic one and… ▽ More This paper concerns the maximum-likelihood channel estimation for MIMO systems with orthogonal space-time block codes when the finite alphabet constraint of the signal constellation is relaxed. We study the channel coefficients estimation subspace generated by this method. We provide an algebraic characterisation of this subspace which turns the optimization problem into a purely algebraic one and more importantly, leads to several interesting analytical proofs. We prove that with probability one, the dimension of the estimation subspace for the channel coefficients is deterministic and it decreases by increasing the number of receive antennas up to a certain critical number of receive antennas, after which the dimension remains constant. In fact, we show that beyond this critical number of receive antennas, the estimation subspace for the channel coefficients is isometric to a fixed deterministic invariant space which can be easily computed for every specific OSTB code. △ Less

Submitted 25 May, 2020; v1 submitted 4 July, 2017; originally announced July 2017.

MSC Class: 94A05; 94A12; 94A15

arXiv:1703.08838 [pdf, other]

Distributed Voting/Ranking with Optimal Number of States per Node

Authors: Saber Salehkaleybar, Arsalan Sharif-Nassab, S. Jamaloddin Golestani

Abstract: Considering a network with $n$ nodes, where each node initially votes for one (or more) choices out of $K$ possible choices, we present a Distributed Multi-choice Voting/Ranking (DMVR) algorithm to determine either the choice with maximum vote (the voting problem) or to rank all the choices in terms of their acquired votes (the ranking problem). The algorithm consolidates node votes across the net… ▽ More Considering a network with $n$ nodes, where each node initially votes for one (or more) choices out of $K$ possible choices, we present a Distributed Multi-choice Voting/Ranking (DMVR) algorithm to determine either the choice with maximum vote (the voting problem) or to rank all the choices in terms of their acquired votes (the ranking problem). The algorithm consolidates node votes across the network by updating the states of interacting nodes using two key operations, the union and the intersection. The proposed algorithm is simple, independent from network size, and easily scalable in terms of the number of choices $K$, using only $K\times 2^{K-1}$ nodal states for voting, and $K\times K!$ nodal states for ranking. We prove the number of states to be optimal in the ranking case, this optimality is conjectured to also apply to the voting case. The time complexity of the algorithm is analyzed in complete graphs. We show that the time complexity for both ranking and voting is $O(\log(n))$ for given vote percentages, and is inversely proportional to the minimum of the vote percentage differences among various choices. △ Less

Submitted 26 March, 2017; originally announced March 2017.

arXiv:1703.08831 [pdf, other]

Token-based Function Computation with Memory

Authors: Saber Salehkaleybar, S. Jamaloddin Golestani

Abstract: In distributed function computation, each node has an initial value and the goal is to compute a function of these values in a distributed manner. In this paper, we propose a novel token-based approach to compute a wide class of target functions to which we refer as "Token-based function Computation with Memory" (TCM) algorithm. In this approach, node values are attached to tokens and travel acros… ▽ More In distributed function computation, each node has an initial value and the goal is to compute a function of these values in a distributed manner. In this paper, we propose a novel token-based approach to compute a wide class of target functions to which we refer as "Token-based function Computation with Memory" (TCM) algorithm. In this approach, node values are attached to tokens and travel across the network. Each pair of travelling tokens would coalesce when they meet, forming a token with a new value as a function of the original token values. In contrast to the Coalescing Random Walk (CRW) algorithm, where token movement is governed by random walk, meeting of tokens in our scheme is accelerated by adopting a novel chasing mechanism. We proved that, compared to the CRW algorithm, the TCM algorithm results in a reduction of time complexity by a factor of at least $\sqrt{n/\log(n)}$ in Erdös-Renyi and complete graphs, and by a factor of $\log(n)/\log(\log(n))$ in torus networks. Simulation results show that there is at least a constant factor improvement in the message complexity of TCM algorithm in all considered topologies. Robustness of the CRW and TCM algorithms in the presence of node failure is analyzed. We show that their robustness can be improved by running multiple instances of the algorithms in parallel. △ Less

Submitted 26 March, 2017; originally announced March 2017.

Showing 1–8 of 8 results for author: Golestani, J